This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On the PowerPC starting with ISA 2.07 (power8), moving a single precision value (SFmode) from a vector register to a GPR involves converting the scalar value in the register from being in double (DFmode) format to the 32-bit vector/storage format, doing the move to the GPR, and then doing a shift right 32-bits to get the value into the bottom 32-bits of the GPR for use as a scalar: xscvdpspn 0,1 mfvsrd 3,0 srdi 3,3,32 It turns out that the current processors starting with ISA 2.06 (power7) through ISA 3.0 (power9) actually duplicates the 32-bit value produced by the XSCVDPSPN and XSCVDPSP instructions into the top 32-bits of the register and to the second 32-bit word. This allows us to eliminate the shift instruction, since the value is already in the correct location for a 32-bit scalar. ISA 3.0 is being updated to include this specification (and other fixes) so that future processors will also be able to eliminate the shift. The new code is: xscvdpspn 0,1 mfvsrwz 3,0 While I was working on the modifications, I noticed that if the user did a round from DFmode to SFmode and then tried to move it to a GPR, it would originally do: frsp 1,2 xscvdpspn 0,1 mfvsrd 3,0 srdi 3,3,32 The XSCVDPSP instruction already handles values outside of the SFmode range (XSCVDPSPN does not), and so I added a combiner pattern to combine the two instructions: xscvdpsp 0,1 mfvsrwz 3,0 While I was looking at the code, I was noticing that if we have a SImode value in a vector register, and we want to sign extended it and leave the value in a GPR register, on power8 the register allocator would decide to do a 32-bit store integer instruction and a sign extending load in the GPR to do the sign extension. I added a splitter to convert this into a pair of MFVSRWZ and EXTSH instructions. I built Spec 2006 with the changes, and I noticed the following changes in the code: * Round DF->SF and move to GPR: namd, wrf; * Eliminate 32-bit shift: gromacs, namd, povray, wrf; * Use of MFVSRWZ/EXTSW: gromacs, povray, calculix, h264ref. I have built these changes on the following machines with bootstrap and no regressions in the regression test: * Big endian power7 (with both 32/64-bit targets); * Little endian power8; * Little endian power9 prototype. Can I check these changes into GCC 8? Can I back port these changes into the GCC 7 branch? [gcc] 2017-09-19 Michael Meissner <meissner@linux.vnet.ibm.com> * config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so it is next to vsx_xscvspdp. (vsx_xscvdpsp_scalar): Use 'ww' constraint instead of 'f' to allow SFmode values being in Altivec registers. (vsx_xscvdpspn): Eliminate uneeded alternative. Use correct constraint ('ws') for DFmode. (vsx_xscvspdpn): Likewise. (vsx_xscvdpspn_scalar): Likewise. (peephole for optimizing move SF to GPR): Adjust code to eliminate needing to do the shift right 32-bits operation after XSCVDPSPN. * config/rs6000/rs6000.md (extendsi<mode>2): Add alternative to do sign extend from vector register to GPR via a split, preventing the register allocator from doing the move via store/load. (extendsi<mode>2 splitter): Likewise. (movsi_from_sf): Adjust code to eliminate doing a 32-bit shift right or vector extract after doing XSCVDPSPN. Use MFVSRWZ instead of MFVSRD to move the value to a GPR register. (movdi_from_sf_zero_ext): Likewise. (movsi_from_df): Add optimization to merge a convert from DFmode to SFmode and moving the SFmode to a GPR to use XSCVDPSP instead of round and XSCVDPSPN. (reload_gpr_from_vsxsf): Use MFVSRWZ instead of MFVSRD to move the value to a GPR register. Rename p8_mfvsrd_4_disf insn to p8_mfvsrwz_disf. (p8_mfvsrd_4_disf): Likewise. (p8_mfvsrwz_disf): Likewise. [gcc/testsuite] 2017-09-19 Michael Meissner <meissner@linux.vnet.ibm.com> * gcc.target/powerpc/pr71977-1.c: Adjust scan-assembler codes to reflect that we don't generate a 32-bit shift right after XSCVDPSPN. * gcc.target/powerpc/direct-move-float1.c: Likewise. * gcc.target/powerpc/direct-move-float3.c: New test. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Attachment:
gcc-power9.patch278b
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |