[PATCH], Use VMRGEW on PowerPC power8/power9 to construct V4SFmode

Michael Meissner meissner@linux.vnet.ibm.com
Mon Sep 19 23:17:00 GMT 2016


On Mon, Sep 19, 2016 at 05:43:19PM -0500, Segher Boessenkool wrote:
> On Mon, Sep 19, 2016 at 06:02:08PM -0400, Michael Meissner wrote:
> > vector float combine (float a, float b, float c, float d)
> > {
> >   return (vector float) { a, b, c, d };
> > }
> 
> [ ... ]
> 
> > However ISA 2.07 (i.e. power8) added the VMRGEW instruction, which can do this
> > more simply:
> > 
> >         xxpermdi 34,1,2,0
> >         xxpermdi 32,3,4,0
> >         xvcvdpsp 34,34
> >         xvcvdpsp 32,32
> >         vmrgew 2,2,0
> 
> This results in {a,c,b,d} instead?

Yes.

> > --- gcc/config/rs6000/rs6000.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 240142)
> > +++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000)	(working copy)
> > @@ -6821,11 +6821,26 @@ rs6000_expand_vector_init (rtx target, r
> >  	  rtx op2 = force_reg (SFmode, XVECEXP (vals, 0, 2));
> >  	  rtx op3 = force_reg (SFmode, XVECEXP (vals, 0, 3));
> >  
> > -	  emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op1));
> > -	  emit_insn (gen_vsx_concat_v2sf (dbl_odd, op2, op3));
> > -	  emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even));
> > -	  emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd));
> > -	  rs6000_expand_extract_even (target, flt_even, flt_odd);
> > +	  /* Use VMRGEW if we can instead of doing a permute.  */
> > +	  if (TARGET_P8_VECTOR)
> > +	    {
> > +	      emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op2));
> > +	      emit_insn (gen_vsx_concat_v2sf (dbl_odd, op1, op3));
> 
> But this looks correct, so just the example is pastoed?

Yes, I pasted the code for -mcpu=power7 and -mcpu=power8.  The original code
puts the elements in a different order, and then fixes it up with a permute.  I
changed the order so that it would match how VMRGEW works, and I tested it on
both big and little endian power8's.

The original puts the values as:

	+-------+-------+-------+-------+
	| a     | unsued| b     | unused|
	+-------+-------+-------+-------+

	+-------+-------+-------+-------+
	| c     | unsued| d     | unused|
	+-------+-------+-------+-------+

The VMRGEW instruction wants the register as:

	+-------+-------+-------+-------+
	| a     | unsued| c     | unused|
	+-------+-------+-------+-------+

	+-------+-------+-------+-------+
	| b     | unsued| d     | unused|
	+-------+-------+-------+-------+

> Okay for trunk if you can clear that up.

Did that answer the question?

> Thanks,
> 
> 
> Segher
> 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797



More information about the Gcc-patches mailing list