This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
This is both a fix to a regression (since GCC 4.9), a code improvement for GLIBC, and fixes potential bugs with the recent changes to allow small integers (32-bit integer, SImode in particular) in floating point and vector registers. The core of the problem is when a SFmode (32-bit binary floating point) value is a scalar in the floating point and vector registers, it is stored internally as a 64-bit binary floating point value. This means that if you could look at the value using the SUBREG mechanism you might see the wrong value. Before the recent changes to add small integer support went in, it was less of an issue, since the only integer type allowed in floating point and vector registers was 64-bit integers (DImode). The regression comes in in how SFmode values are moved between general purpose and floating point/vector registers. Up through the power7, the way you moved SFmode values from one register set to another was store and then load. Doing back to back store and load to the same location could cause serious performance problems on recent power systems. On the power7 and power8, we would put a special NOP that forces the two instructions to be in different dispatch groups, which helps somewhat. When the power8 (ISA 2.07) came out, it had direct move instructions and convert between scalar double precision and single precision. I added the appropriate secondary_reload support so that if the register allocator wanted to move a SFmode value between register banks, it would create a temporary and do the appropriate instructions to move the value. This worked in the GCC 4.8 time frame. Some time in the 4.9 time frame, this broke and the register allocator would more often generate store and load instead of the direct move sequences. However, simple test cases continued to use the direct move instructions. In checking on power9 (ISA 3.0) code, it more likely uses the store/load than direct move. On power8 spec runs we have seen the effect of these store/load sequences on some benchmarks from the next generation of the Spec suite. The optimization that the GLIBC implementers have requested (PR 71977) was to speed up code sequences that they use in implementing the single precision math library. They often times need to extract/modify bits in floating point values (for example, setting exponents or mantissas, etc.). For example from e_powf.c, you see code like this after macro expansion: typedef union { float value; u_int32_t word; } ieee_float_shape_type; float t1; int32_t is; /* ... */ do { ieee_float_shape_type gf_u; gf_u.value = (t1); (is) = gf_u.word; } while (0); do { ieee_float_shape_type sf_u; sf_u.word = (is&0xfffff000); (t1) = sf_u.value; } while (0); Originally, I just wrote a peephole2 to catch the above code, and it worked in small test cases on the power8. But it didn't work on larger programs or on the power9. I also wanted to fix the performance issue that we've seen. I also became convinced that for GCC 7, it was a ticking time bomb where eventually somebody would write code that intermixed SImode and SFmode, and it would get the wrong value. The main part of the patch is to not let the compiler generate: (set (reg:SI) (subreg:SF (reg:SI))) or (set (reg:SI) (subreg:SI (reg:SF))) Most of the register predicates eventually call gpc_reg_operand, and it was simple to put the check in there, and to other predicates that did not call gpc_reg_operand. I created new insns to do the move between formats that allocated the temporary needed with match_scratch. There were places that then needed to not have the check (the movsi/movsf expanders themselves, and the insn spliters for format conversion insns), and I added a predicate for that. I have built the patches on a little endian power8, a big endian power8 (64-bit only), and a big endian power7 (both 32-bit and 64-bit). There were no regression failures. In addition, I built spec 2006, with the fixes and without, and did a quick run comparing the results (1 run). I am re-running the spec results, with the code merged to today's trunk, and with 3 runs to isolate the occasional benchmark that goes off in the weeds. Of the 29 benchmarks in Spec 2006 CPU, 6 benchmarks had changes in the instructions generated (perlbench, gromacs, cactusADM, namd, povray, wrf). In the single run I did, there were no regressions, and 2 or 3 benchmarks improved: namd 6% tonto 3% libquantum 6% However, single runs of libquantum have varied as much as 10%, so without seeing more runs, I will skip it. Namd was one of the benchmarks that saw changes in code generation, but tonto did not have changes of code. I suspect having the separate converter unspec insn, allowed the scheduler to move things in between the move and the use. So, in summary, can I check these changes into the GCC 7 trunk? Given it does fix a long standing regression in GCC 6 that hurts performance, did you want me to develop the changes for a GCC 6 backport as well? I realize it is skating on thin ice whether it is a feature or a fix. [gcc] 2016-12-29 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/71977 PR target/70568 PR target/78823 * config/rs6000/predicates.md (sf_subreg_operand): New predicate to return true if the operand contains a SUBREG mixing SImode and SFmode on 64-bit VSX systems with direct move. This can be a problem in that we have to know whether the SFmode value should be represented in the 32-bit memory format or the 64-bit scalar format used within the floating point and vector registers. (altivec_register_operand): Do not return true if the operand contains a SUBREG mixing SImode and SFmode. (vsx_register_operand): Likewise. (vsx_reg_sfsubreg_ok): New predicate. Like vsx_register_operand, but do not check if there is a SUBREG mixing SImode and SFmode on 64-bit VSX systems with direct move. (vfloat_operand): Do not return true if the operand contains a SUBREG mixing SImode and SFmode. (vint_operand): Likewise. (vlogical_operand): Likewise. (gpc_reg_operand): Likewise. (int_reg_operand): Likewise. * config/rs6000/rs6000.c (valid_sf_si_move): New function to determine if a MOVSI or MOVSF operation contains SUBREGs that mix SImode and SFmode. (rs6000_emit_move): If we have a MOVSI or MOVSF operation that contains SUBREGs that mix SImode and SFmode, call special insns, that can allocate the necessary temporary registers to convert between SFmode and SImode within the registers. * config/rs6000/vsx.md (SFBOOL_*): Add peephole2 to recognize when we are converting a SFmode to a SImode, moving the result to a GPR register, doing a single AND/IOR/XOR operation, and then moving it back to a vector register. Change the insns recognized to move the integer value to the vector register and do the operation there. This code occurs quite a bit in the GLIBC math library in float math functions. (peephole2 to speed up GLIB math functions): Likewise. * config/rs6000/rs6000-protos.h (valid_sf_si_move): Add declaration. * config/rs6000/rs6000.h (TARGET_NO_SF_SUBREG): New internal target macros to say whether we need to avoid SUBREGs mixing SImode and SFmode. (TARGET_ALLOW_SF_SUBREG): Likewise. * config/rs6000/rs6000.md (UNSPEC_SF_FROM_SI): Ne unspecs. (UNSPEC_SI_FROM_SF): Likewise. (iorxor): Change spacing. (and_ior_xor): New iterator for AND, IOR, and XOR. (movsi_from_sf): New insn to handle where we are moving SFmode values to SImode registers and we need to convert the value to the memory format from the format used within the register. (movdi_from_sf_zero_ext): Optimize zero extending movsi_from_sf. (mov<mode>_hardfloat, FMOVE32 iterator): Don't allow moving SUBREGs mixing SImode and SFmode on 64-bit VSX systems with direct move. (movsf_from_si): New insn to handle where we are moving SImode values to SFmode registers and we need to convert the value to the the format used within the floating point and vector registers to the 32-bit memory format. (fma<mode>4): Change register_operand to gpc_reg_operand to prevent SUBREGs mixing SImode and SFmode. (fms<mode>4): Likewise. (fnma<mode>4): Likewise. (fnms<mode>4): Likewise. (nfma<mode>4): Likewise. (nfms<mode>4): Likewise. [gcc/testsuite] 2016-12-29 Michael Meissner <meissner@linux.vnet.ibm.com> PR target/71977 PR target/70568 PR target/78823 * gcc.target/powerpc/pr71977-1.c: New tests to check whether on 64-bit VSX systems with direct move, whether we optimize common code sequences in the GLIBC math library for float math functions. * gcc.target/powerpc/pr71977-2.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Attachment:
pr71977.patch04b
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |