This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH], Generate XXBR{H,W,D} for bswap{16,32,64} on PowerPC ISA 3.0 (power9)


On Wed, Nov 08, 2017 at 08:01:06AM -0600, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Wed, Nov 08, 2017 at 08:14:31AM -0500, Michael Meissner wrote:
> > PowerPC ISA 3.0 does not have a byte-reverse instruction that operates on the
> > GPRs, but it does have vector byte swap half-word, word, double-word operations
> > in the VSX registers.  The enclosed patch enables generation of the byte
> > revseral instructions for register-register operations.  It still prefers to
> > generate the load with byte reverse (L{H,W,D}BRX) or store with byte reverse
> > (ST{H,W,D}BRX) instructions over the register sequence.
> > 
> > For 16-bit and 32-bit byte swaps, it typically does the tradational operation
> > in GPR registers, but it will generate XXBR{H,W} if the values are in vector
> > registers.
> 
> You can do the byteswaps with just VMX, too, with some rotate instructions
> (vrlh 8 for HI, vrlh 8 followed by vrlw 16 for SI, etc.)  For a future
> date, I suppose.
> 
> > For 64-bit swaps, it no longer generates the 9 instruction sequence in favor of
> > XXBRD.  I did some timing runs on a prototype power9 system, and it was
> > slightly faster to do direct move to the vecter unit, XXBRD, and direct move
> > back to a GPR than the traditional sequence.
> > 
> > I did bootstraps on little endian Power8 and Power9 systems (with the default
> > cpu set to power8 and power9 respectively).  There were no regressions.  Can I
> > check this patch into the trunk?
> > 
> > [gcc]
> > 2017-11-08  Michael Meissner  <meissner@linux.vnet.ibm.com>
> > 
> > 	* config/rs6000/rs6000.md (bswaphi2_reg): On ISA 3.0 systems,
> > 	enable generating XXBR{H,W} if the value is in a vector
> > 	register.
> 
> Join the last two lines?  And you only mean XXBRH here.
> 
> > 	(bswapsi2_reg): Likewise.
> 
> And XXBRW here.

No, I meant the comment for both insns, but I will separate them.

> > 	(bswapdi2_reg): On ISA 3.0 systems, use XXBRD to do bswap64
> > 	instead of doing the GPR sequence used on previoius machines.
> 
> Typo ("previous").

Ok.

> > 	(bswapdi2_xxbrd): Likewise.
> 
> Not "Likewise"; just "New define_insn."

Ok.

> Should this somehow be joined with p9_xxbrd_<mode>?  Or maybe you should
> generate that, instead.

No, since p9_xxbrd_<mode> doesn't include DImode.  We could add yet another
mode iterator to include DI + V2DI/V2DF modes, but that seems to be overkill.

> > 	(bswapdi2_reg splitters): Use int_reg_operand instead of
> > 	gpc_reg_operand to not match when XXBRD is generated.
> 
> Please see if you can merge the define_splits to their corresponding
> define_insns (making define_insn_and_split).  Shorter, no mismatch
> possible between the two anymore, and you get a name for the patterns
> too :-)

No, there are two different patterns that generate the register to register
bswap64.  One is the fall through path on bswapdi2 if the current target is not
64-bit or is 64-bit and doesn't support LDBRX.  The second is from bswapdi2_reg
doing the split.

I simplified it to only change the one insn that would normally match the
register to register case to skip if ISA 3.0.  I put a test on bswapdi2_reg as
well.

> > @@ -2507,6 +2513,8 @@ (define_expand "bswapdi2"
> >  	emit_insn (gen_bswapdi2_load (dest, src));
> >        else if (MEM_P (dest))
> >  	emit_insn (gen_bswapdi2_store (dest, src));
> > +      else if (TARGET_P9_VECTOR)
> > +	emit_insn (gen_bswapdi2_xxbrd (dest, src));
> >        else
> >  	emit_insn (gen_bswapdi2_reg (dest, src));
> >        DONE;
> 
> Pity that you cannot easily do similar for HI and SI.

Not really.  Bswap64 generates 9 separate insns, while bswap32 and bswap16 only
generate 3 insns.  So, having to add the move direct to/from the vector
registers would mean it would be slower than the normal case that is currently
generated.  But if the value happens to be in a VSX register, then we can do it
in one instruction.

Of course if we had a bswap insn in GPR registers, that would simplify things.
But we don't currently.

> > @@ -2544,7 +2559,8 @@ (define_insn "bswapdi2_reg"
> >     (clobber (match_scratch:DI 3 "=&r"))]
> >    "TARGET_POWERPC64 && TARGET_LDBRX"
> >    "#"
> > -  [(set_attr "length" "36")])
> > +  [(set_attr "length" "36")
> > +   (set_attr "type" "*")])
> 
> Explicitly setting "type" (or any other attr) to the default value is
> pretty useless; just leave it out.

Yep.  I missed that in simplifying the patch.  The first version, kept in the
GPR bswap patterns, and it needed to set the type vecperm on the VSX case.
After doing some testing, I yanked out the GPR support on ISA 3.0, but didn't
delete that line.

Here is the latest patch (tested and no regressions).  Can I check this into
the trunk?

After a burn-in period, I plan to backport a reduced version of the patch
(XXBRD only) to gcc 7, unless you think it shouldn't go into gcc 7.

[gcc]
2017-11-09  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.md (bswaphi2_reg): On ISA 3.0 systems,
	enable generating XXBRH if the value is in a vector register.
	(bswapsi2_reg): On ISA 3.0 systems, enable generating XXBRW if the
	value is in a vector register.
	(bswapdi2_reg): On ISA 3.0 systems, always use XXBRD to do
	register to register bswap64's instead of doing the GPR sequence
	used on previous machines.
	(bswapdi2_xxbrd): New insn.
	(bswapdi2_reg): Disallow on ISA 3.0.
	(register to register bswap64 splitter): Do not split the insn on
	ISA 3.0 systems that use XXBRD.

[gcc/testsuite]
2017-11-09  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/p9-xxbr-3.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Attachment: gcc-power9.patch301b
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]