[PATCH #2], PR 71201, Fix xxperm fusion on PowerPC ISA 3.0

Michael Meissner meissner@linux.vnet.ibm.com
Mon May 23 22:17:00 GMT 2016


On Thu, May 19, 2016 at 10:33:41AM -0500, Segher Boessenkool wrote:
> On Thu, May 19, 2016 at 10:53:41AM -0400, Michael Meissner wrote:
> > GCC 6.1 added support for the XXPERM instruction for the PowerPC ISA 3.0.  The
> > XXPERM instruction is essentially a 4 operand instruction, with only 3 operands
> > in the instruction (the target register overlaps with the first input
> > register).  The Power9 hardware has fusion support where if the instruction
> > that precedes the XXPERM is a XXLOR move instruction to set the first input
> > argument, it is fused with the XXPERM.  I added code to support this fusion.
> > 
> > Unfortunately, in running the testsuite on the power9 simulator, we discovered
> > that the test gcc.c-torture/execute/pr56866.c would fail because the fusion
> > alternatives confused the register allocator and/or the passes after the
> > register allocator.  This patch removes the explicit fusion support from
> > XXPERM.
> 
> Okay.  Please keep the PR open until that problem is fixed.  It also
> shouldn't be "target" category, if the problem is RA.
> 
> > In addition, ISA 3.0 added XXPERMR and VPERMR instructions for little endian
> > support where the permute vector reverses the bytes.  This patch adds support
> > for XXPERMR/VPERMR.
> 
> Please send that as a separate patch, it has nothing to do with the PR.
> 
> > +	x = gen_rtx_UNSPEC (mode,
> > +			    gen_rtvec (3, target, reg, 
> 
> Trailing space.
> 
> > +  if (TARGET_P9_VECTOR)
> > +    {
> > +      unspec = gen_rtx_UNSPEC (mode, gen_rtvec (3, op0, op1, sel), 
> 
> And another.
> 
> > +	 The VNAND is preferred for future fusion opportunities.  */
> > +      notx = gen_rtx_NOT (V16QImode, sel);
> > +      iorx = (TARGET_P8_VECTOR
> > +	      ? gen_rtx_IOR (V16QImode, notx, notx)
> > +	      : gen_rtx_AND (V16QImode, notx, notx));
> > +      emit_insn (gen_rtx_SET (norreg, iorx));
> > +      
> 
> Some more.
> 
> > +/* { dg-final { scan-assembler	   "vpermr\|xxpermr" } } */
> 
> Tab in the middle of the line.

This patch just fixes the xxperm fusion problem, and I will submit the
vpermr/xxpermr support in another patch.

Note, if you believe the register allocator and the post reload RTL passes need
to be fixed to allow the fusion of the move to the xxperm, that is fine.
However, take it on yourself.  As the person who wrote the code to add fusion
support for xxperm, I now think it was a bad idea, and I want to remove that
support.  It would probably be better done by modifying the scheduler to keep
the move and xxperm together, rather than including it in the insn.

I have bootstrapped the compiler on a little endian power9 system and there
were no regressions in the test suite.  Is it ok to check into the trunk and on
the 6.1 branch?

[gcc]
2016-05-23  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/71201
	* config/rs6000/altivec.md (altivec_vperm_<mode>_internal): Drop
	ISA 3.0 xxperm fusion alternative.
	(altivec_vperm_v8hiv16qi): Likewise.
	(altivec_vperm_<mode>_uns_internal): Likewise.
	(vperm_v8hiv4si): Likewise.
	(vperm_v16qiv8hi): Likewise.

[gcc/testsuite]
2016-05-23  Michael Meissner  <meissner@linux.vnet.ibm.com>
	    Kelvin Nilsen  <kelvin@gcc.gnu.org>

	* gcc.target/powerpc/p9-permute.c: Run test on big endian as well
	as little endian.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
-------------- next part --------------
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc)	(revision 236600)
+++ gcc/config/rs6000/altivec.md	(.../gcc)	(working copy)
@@ -1952,32 +1952,30 @@ (define_expand "altivec_vperm_<mode>"
 
 ;; Slightly prefer vperm, since the target does not overlap the source
 (define_insn "*altivec_vperm_<mode>_internal"
-  [(set (match_operand:VM 0 "register_operand" "=v,?wo,?&wo")
-	(unspec:VM [(match_operand:VM 1 "register_operand" "v,0,wo")
-		    (match_operand:VM 2 "register_operand" "v,wo,wo")
-		    (match_operand:V16QI 3 "register_operand" "v,wo,wo")]
+  [(set (match_operand:VM 0 "register_operand" "=v,?wo")
+	(unspec:VM [(match_operand:VM 1 "register_operand" "v,0")
+		    (match_operand:VM 2 "register_operand" "v,wo")
+		    (match_operand:V16QI 3 "register_operand" "v,wo")]
 		   UNSPEC_VPERM))]
   "TARGET_ALTIVEC"
   "@
    vperm %0,%1,%2,%3
-   xxperm %x0,%x2,%x3
-   xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3"
+   xxperm %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4")])
 
 (define_insn "altivec_vperm_v8hiv16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=v,?wo,?&wo")
-	(unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v,0,wo")
-   	               (match_operand:V8HI 2 "register_operand" "v,wo,wo")
-		       (match_operand:V16QI 3 "register_operand" "v,wo,wo")]
+  [(set (match_operand:V16QI 0 "register_operand" "=v,?wo")
+	(unspec:V16QI [(match_operand:V8HI 1 "register_operand" "v,0")
+   	               (match_operand:V8HI 2 "register_operand" "v,wo")
+		       (match_operand:V16QI 3 "register_operand" "v,wo")]
 		   UNSPEC_VPERM))]
   "TARGET_ALTIVEC"
   "@
    vperm %0,%1,%2,%3
-   xxperm %x0,%x2,%x3
-   xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3"
+   xxperm %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4")])
 
 (define_expand "altivec_vperm_<mode>_uns"
   [(set (match_operand:VM 0 "register_operand" "")
@@ -1995,18 +1993,17 @@ (define_expand "altivec_vperm_<mode>_uns
 })
 
 (define_insn "*altivec_vperm_<mode>_uns_internal"
-  [(set (match_operand:VM 0 "register_operand" "=v,?wo,?&wo")
-	(unspec:VM [(match_operand:VM 1 "register_operand" "v,0,wo")
-		    (match_operand:VM 2 "register_operand" "v,wo,wo")
-		    (match_operand:V16QI 3 "register_operand" "v,wo,wo")]
+  [(set (match_operand:VM 0 "register_operand" "=v,?wo")
+	(unspec:VM [(match_operand:VM 1 "register_operand" "v,0")
+		    (match_operand:VM 2 "register_operand" "v,wo")
+		    (match_operand:V16QI 3 "register_operand" "v,wo")]
 		   UNSPEC_VPERM_UNS))]
   "TARGET_ALTIVEC"
   "@
    vperm %0,%1,%2,%3
-   xxperm %x0,%x2,%x3
-   xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3"
+   xxperm %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4")])
 
 (define_expand "vec_permv16qi"
   [(set (match_operand:V16QI 0 "register_operand" "")
@@ -2844,32 +2841,30 @@ (define_expand "vec_unpacks_lo_<VP_small
   "")
 
 (define_insn "vperm_v8hiv4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=v,?wo,?&wo")
-        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v,0,wo")
-		      (match_operand:V4SI 2 "register_operand" "v,wo,wo")
-		      (match_operand:V16QI 3 "register_operand" "v,wo,wo")]
+  [(set (match_operand:V4SI 0 "register_operand" "=v,?wo")
+        (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v,0")
+		      (match_operand:V4SI 2 "register_operand" "v,wo")
+		      (match_operand:V16QI 3 "register_operand" "v,wo")]
                   UNSPEC_VPERMSI))]
   "TARGET_ALTIVEC"
   "@
    vperm %0,%1,%2,%3
-   xxperm %x0,%x2,%x3
-   xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3"
+   xxperm %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4")])
 
 (define_insn "vperm_v16qiv8hi"
-  [(set (match_operand:V8HI 0 "register_operand" "=v,?wo,?&wo")
-        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v,0,wo")
-		      (match_operand:V8HI 2 "register_operand" "v,wo,wo")
-		      (match_operand:V16QI 3 "register_operand" "v,wo,wo")]
+  [(set (match_operand:V8HI 0 "register_operand" "=v,?wo")
+        (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v,0")
+		      (match_operand:V8HI 2 "register_operand" "v,wo")
+		      (match_operand:V16QI 3 "register_operand" "v,wo")]
                   UNSPEC_VPERMHI))]
   "TARGET_ALTIVEC"
   "@
    vperm %0,%1,%2,%3
-   xxperm %x0,%x2,%x3
-   xxlor %x0,%x1,%x1\t\t# xxperm fusion\;xxperm %x0,%x2,%x3"
+   xxperm %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
-   (set_attr "length" "4,4,8")])
+   (set_attr "length" "4")])
 
 
 (define_expand "vec_unpacku_hi_v16qi"
Index: gcc/testsuite/gcc.target/powerpc/p9-permute.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p9-permute.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc)	(revision 236600)
+++ gcc/testsuite/gcc.target/powerpc/p9-permute.c	(.../gcc)	(working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-do compile { target { powerpc64*-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
 /* { dg-options "-mcpu=power9 -O2" } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
@@ -17,5 +17,6 @@ permute (vector long long *p, vector lon
   return vec_perm (a, b, mask);
 }
 
+/* expect xxpermr on little-endian, xxperm on big-endian */
 /* { dg-final { scan-assembler	   "xxperm" } } */
 /* { dg-final { scan-assembler-not "vperm"  } } */


More information about the Gcc-patches mailing list