[PATCH], Use VMRGEW on PowerPC power8/power9 to construct V4SFmode
Michael Meissner
meissner@linux.vnet.ibm.com
Mon Sep 19 23:17:00 GMT 2016
On Mon, Sep 19, 2016 at 05:43:19PM -0500, Segher Boessenkool wrote:
> On Mon, Sep 19, 2016 at 06:02:08PM -0400, Michael Meissner wrote:
> > vector float combine (float a, float b, float c, float d)
> > {
> > return (vector float) { a, b, c, d };
> > }
>
> [ ... ]
>
> > However ISA 2.07 (i.e. power8) added the VMRGEW instruction, which can do this
> > more simply:
> >
> > xxpermdi 34,1,2,0
> > xxpermdi 32,3,4,0
> > xvcvdpsp 34,34
> > xvcvdpsp 32,32
> > vmrgew 2,2,0
>
> This results in {a,c,b,d} instead?
Yes.
> > --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 240142)
> > +++ gcc/config/rs6000/rs6000.c (.../gcc/config/rs6000) (working copy)
> > @@ -6821,11 +6821,26 @@ rs6000_expand_vector_init (rtx target, r
> > rtx op2 = force_reg (SFmode, XVECEXP (vals, 0, 2));
> > rtx op3 = force_reg (SFmode, XVECEXP (vals, 0, 3));
> >
> > - emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op1));
> > - emit_insn (gen_vsx_concat_v2sf (dbl_odd, op2, op3));
> > - emit_insn (gen_vsx_xvcvdpsp (flt_even, dbl_even));
> > - emit_insn (gen_vsx_xvcvdpsp (flt_odd, dbl_odd));
> > - rs6000_expand_extract_even (target, flt_even, flt_odd);
> > + /* Use VMRGEW if we can instead of doing a permute. */
> > + if (TARGET_P8_VECTOR)
> > + {
> > + emit_insn (gen_vsx_concat_v2sf (dbl_even, op0, op2));
> > + emit_insn (gen_vsx_concat_v2sf (dbl_odd, op1, op3));
>
> But this looks correct, so just the example is pastoed?
Yes, I pasted the code for -mcpu=power7 and -mcpu=power8. The original code
puts the elements in a different order, and then fixes it up with a permute. I
changed the order so that it would match how VMRGEW works, and I tested it on
both big and little endian power8's.
The original puts the values as:
+-------+-------+-------+-------+
| a | unsued| b | unused|
+-------+-------+-------+-------+
+-------+-------+-------+-------+
| c | unsued| d | unused|
+-------+-------+-------+-------+
The VMRGEW instruction wants the register as:
+-------+-------+-------+-------+
| a | unsued| c | unused|
+-------+-------+-------+-------+
+-------+-------+-------+-------+
| b | unsued| d | unused|
+-------+-------+-------+-------+
> Okay for trunk if you can clear that up.
Did that answer the question?
> Thanks,
>
>
> Segher
>
--
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
More information about the Gcc-patches
mailing list