This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH], Improve moving SFmode to GPR on PowerPC


On the PowerPC starting with ISA 2.07 (power8), moving a single precision value
(SFmode) from a vector register to a GPR involves converting the scalar value
in the register from being in double (DFmode) format to the 32-bit
vector/storage format, doing the move to the GPR, and then doing a shift right
32-bits to get the value into the bottom 32-bits of the GPR for use as a
scalar:

	xscvdpspn 0,1
	mfvsrd    3,0
	srdi      3,3,32

It turns out that the current processors starting with ISA 2.06 (power7)
through ISA 3.0 (power9) actually duplicates the 32-bit value produced by the
XSCVDPSPN and XSCVDPSP instructions into the top 32-bits of the register and to
the second 32-bit word.  This allows us to eliminate the shift instruction,
since the value is already in the correct location for a 32-bit scalar.

ISA 3.0 is being updated to include this specification (and other fixes) so
that future processors will also be able to eliminate the shift.

The new code is:

	xscvdpspn 0,1
	mfvsrwz   3,0

While I was working on the modifications, I noticed that if the user did a
round from DFmode to SFmode and then tried to move it to a GPR, it would
originally do:

	frsp      1,2
	xscvdpspn 0,1
	mfvsrd    3,0
	srdi      3,3,32

The XSCVDPSP instruction already handles values outside of the SFmode range
(XSCVDPSPN does not), and so I added a combiner pattern to combine the two
instructions:

	xscvdpsp  0,1
	mfvsrwz   3,0

While I was looking at the code, I was noticing that if we have a SImode value
in a vector register, and we want to sign extended it and leave the value in a
GPR register, on power8 the register allocator would decide to do a 32-bit
store integer instruction and a sign extending load in the GPR to do the sign
extension.  I added a splitter to convert this into a pair of MFVSRWZ and
EXTSH instructions.

I built Spec 2006 with the changes, and I noticed the following changes in the
code:

    * Round DF->SF and move to GPR: namd, wrf;
    * Eliminate 32-bit shift: gromacs, namd, povray, wrf;
    * Use of MFVSRWZ/EXTSW: gromacs, povray, calculix, h264ref.

I have built these changes on the following machines with bootstrap and no
regressions in the regression test:

    * Big endian power7 (with both 32/64-bit targets);
    * Little endian power8;
    * Little endian power9 prototype.

Can I check these changes into GCC 8?  Can I back port these changes into the
GCC 7 branch?

[gcc]
2017-09-19  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so it is
	next to vsx_xscvspdp.
	(vsx_xscvdpsp_scalar): Use 'ww' constraint instead of 'f' to allow
	SFmode values being in Altivec registers.
	(vsx_xscvdpspn): Eliminate uneeded alternative.  Use correct
	constraint ('ws') for DFmode.
	(vsx_xscvspdpn): Likewise.
	(vsx_xscvdpspn_scalar): Likewise.
	(peephole for optimizing move SF to GPR): Adjust code to eliminate
	needing to do the shift right 32-bits operation after XSCVDPSPN.
	* config/rs6000/rs6000.md (extendsi<mode>2): Add alternative to do
	sign extend from vector register to GPR via a split, preventing
	the register allocator from doing the move via store/load.
	(extendsi<mode>2 splitter): Likewise.
	(movsi_from_sf): Adjust code to eliminate doing a 32-bit shift
	right or vector extract after doing XSCVDPSPN.  Use MFVSRWZ
	instead of MFVSRD to move the value to a GPR register.
	(movdi_from_sf_zero_ext): Likewise.
	(movsi_from_df): Add optimization to merge a convert from DFmode
	to SFmode and moving the SFmode to a GPR to use XSCVDPSP instead
	of round and XSCVDPSPN.
	(reload_gpr_from_vsxsf): Use MFVSRWZ instead of MFVSRD to move the
	value to a GPR register.  Rename p8_mfvsrd_4_disf insn to
	p8_mfvsrwz_disf.
	(p8_mfvsrd_4_disf): Likewise.
	(p8_mfvsrwz_disf): Likewise.

[gcc/testsuite]
2017-09-19  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/pr71977-1.c: Adjust scan-assembler codes to
	reflect that we don't generate a 32-bit shift right after
	XSCVDPSPN.
	* gcc.target/powerpc/direct-move-float1.c: Likewise.
	* gcc.target/powerpc/direct-move-float3.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Attachment: gcc-power9.patch278b
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]