[PATCH], PR target/93932, Do not use input_operand for variable vector extract insns on PowerPC

Wed Feb 26 22:32:00 GMT 2020

As part of improving the vector extract built-ins for the future processor, and
also looking at optimizing all of the cases with vector extracts and sign,
zero, float extension for GCC 11 (PR target/93230), I noticed that the code
generated for some of the tests had regressed from the GCC 8 time frame.

What is happening is in some instances, we want to do a vector extract with a
variable element where the vector is in a register:

	 #include <altivec.h>

	long long
	foo (vector long long v, unsigned long n)
	{
	  return vec_extract (v, n);
	}

During the reload pass, the register allocator decides that it should spill the
insn to the stack, and then do the vector extract from memory (which is an
optimization to prevent loading the vector in case we only want one element).

I came to the conclusion that having one insn that did both the variable vector
extract on a register and from memory, allowed this bad code to be generated.

I split the 3 variable vector extract insn patterns that used input_operand to
handle either type, to have one insn that handled a register, and a second insn
that handles only memory.

Note, there is a 4th place that uses input_operand for variable vector extracts
that is not touched by this patch.  The insn is trying to combine a variable
vector extract with a zero extend operation.  That insn pattern never will
match due to the way the insn pattern is written.  I will be submitting patches
to fix this insn separately (PR target/93937), and that patch will split the
insn into two separate insns.

I have built this code on both a little endian system and a big endian PowerPC
system, and I ran make check for C, C++, Fortran, and LTO.  There were no
regressions after applying the patch for the test suite.  Can I check this into
the master branch?

Because it also fails on the GCC 9 branch, I will be submitting patches for
that.  Due to the changes between GCC 9 and the current master, the patch in
this mail can not be blindly applied, but I have a patch for GCC 9.  GCC 8
generates the correct code, so it does not have to be fixed.

[gcc]
2020-02-26  Michael Meissner  <meissner@linux.ibm.com>

	PR target/93932
	* config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator):
	Split the insn into two parts.  This insn only does variable
	extract from a register.
	(vsx_extract_<mode>_var_load, VSX_D iterator): New insn, do
	variable extract from memory.
	(vsx_extract_v4sf_var): Split the insn into two parts.  This insn
	only does variable extract from a register.
	(vsx_extract_v4sf_var_load): New insn, do variable extract from
	memory.
	(vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Split the insn
	into two parts.  This insn only does variable extract from a
	register.
	(vsx_extract_<mode>_var_load, VSX_EXTRACT_I iterator): New insn,
	do variable extract from memory.

[gcc/testsuite]
2020-02-26  Michael Meissner  <meissner@linux.ibm.com>

	PR target/93932
	* gcc.target/powerpc/fold-vec-extract-longlong.p8.c: Adjust
	instruction counts.

--- /tmp/TMHdwO_vsx.md	2020-02-26 13:25:27.250209645 -0500
+++ gcc/config/rs6000/vsx.md	2020-02-26 13:25:21.357125563 -0500
@@ -3245,14 +3245,14 @@ (define_insn "vsx_vslo_<mode>"
   "vslo %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-;; Variable V2DI/V2DF extract
+;; Variable V2DI/V2DF extract from a register
 (define_insn_and_split "vsx_extract_<mode>_var"
-  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v,wa,r")
-	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "input_operand" "v,Q,Q")
-			     (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=v")
+	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "gpc_reg_operand" "v")
+			     (match_operand:DI 2 "gpc_reg_operand" "r")]
 			    UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,&b,&b"))
-   (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
+   (clobber (match_scratch:DI 3 "=r"))
+   (clobber (match_scratch:V2DI 4 "=&v"))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3263,6 +3263,23 @@ (define_insn_and_split "vsx_extract_<mod
   DONE;
 })
 
+;; Variable V2DI/V2DF extract from memory
+(define_insn_and_split "*vsx_extract_<mode>_var_load"
+  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=wa,r")
+	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "memory_operand" "Q,Q")
+			     (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+			    UNSPEC_VSX_EXTRACT))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
+					   operands[3], <VS_scalar>mode);
+}
+  [(set_attr "type" "fpload,load")])
+
 ;; Extract a SF element from V4SF
 (define_insn_and_split "vsx_extract_v4sf"
   [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
@@ -3315,14 +3332,14 @@ (define_insn_and_split "*vsx_extract_v4s
    (set_attr "length" "8")
    (set_attr "isa" "*,p7v,p9v,*")])
 
-;; Variable V4SF extract
+;; Variable V4SF extract from a register
 (define_insn_and_split "vsx_extract_v4sf_var"
-  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r")
-	(unspec:SF [(match_operand:V4SF 1 "input_operand" "v,Q,Q")
-		    (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
+	(unspec:SF [(match_operand:V4SF 1 "gpc_reg_operand" "v")
+		    (match_operand:DI 2 "gpc_reg_operand" "r")]
 		   UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,&b,&b"))
-   (clobber (match_scratch:V2DI 4 "=&v,X,X"))]
+   (clobber (match_scratch:DI 3 "=r"))
+   (clobber (match_scratch:V2DI 4 "=&v"))]
   "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3333,6 +3350,23 @@ (define_insn_and_split "vsx_extract_v4sf
   DONE;
 })
 
+;; Variable V4SF extract from memory
+(define_insn_and_split "*vsx_extract_v4sf_var_load"
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,?r")
+	(unspec:SF [(match_operand:V4SF 1 "memory_operand" "Q,Q")
+		    (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+		   UNSPEC_VSX_EXTRACT))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
+					   operands[3], SFmode);
+}
+  [(set_attr "type" "fpload,load")])
+
 ;; Expand the builtin form of xxpermdi to canonical rtl.
 (define_expand "vsx_xxpermdi_<mode>"
   [(match_operand:VSX_L 0 "vsx_register_operand")
@@ -3677,15 +3711,15 @@ (define_insn_and_split "*vsx_extract_<mo
   [(set_attr "type" "load")
    (set_attr "length" "8")])
 
-;; Variable V16QI/V8HI/V4SI extract
+;; Variable V16QI/V8HI/V4SI extract from a register
 (define_insn_and_split "vsx_extract_<mode>_var"
-  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
+  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r")
 	(unspec:<VS_scalar>
-	 [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q")
-	  (match_operand:DI 2 "gpc_reg_operand" "r,r,r")]
+	 [(match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,v")
+	  (match_operand:DI 2 "gpc_reg_operand" "r,r")]
 	 UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=r,r,&b"))
-   (clobber (match_scratch:V2DI 4 "=X,&v,X"))]
+   (clobber (match_scratch:DI 3 "=r,r"))
+   (clobber (match_scratch:V2DI 4 "=X,&v"))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3695,7 +3729,25 @@ (define_insn_and_split "vsx_extract_<mod
 				operands[3], operands[4]);
   DONE;
 }
-  [(set_attr "isa" "p9v,*,*")])
+  [(set_attr "isa" "p9v,*")])
+
+;; Variable V16QI/V8HI/V4SI extract from memory
+(define_insn_and_split "*vsx_extract_<mode>_var_load"
+  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r")
+	(unspec:<VS_scalar>
+	 [(match_operand:VSX_EXTRACT_I 1 "memory_operand" "Q")
+	  (match_operand:DI 2 "gpc_reg_operand" "r")]
+	 UNSPEC_VSX_EXTRACT))
+   (clobber (match_scratch:DI 3 "=&b"))]
+  "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
+					   operands[3], <VS_scalar>mode);
+}
+  [(set_attr "type" "load")])
 
 (define_insn_and_split "*vsx_extract_<mode>_<VS_scalar>mode_var"
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r")
--- /tmp/oo6tIZ_fold-vec-extract-longlong.p8.c	2020-02-26 14:59:46.104993989 -0500
+++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-longlong.p8.c	2020-02-26 14:58:19.324752612 -0500
@@ -7,24 +7,23 @@
 
 // Targeting P8LE and P8BE, six tests total.
 // P8 (LE) constants: mfvsrd
-// P8 (LE) variables: addi,xxpermdi,mr,stxvd2x|stxvd4x,rldicl,sldi,ldx,blr
-// P8 (BE) constants: mfvsrd
-// P8 (BE) Variables: addi,xxpermdi,rldicl,mr,stxvd2x|stxvd4x,sldi,ldx,blr
+// P8 (LE) variables: xori, rldic, mtvsrd, xxpermdi, vslo, mfvsrd
+// P8 (BE) constants: xxpermdi, mfvsrd
+// P8 (BE) Variables:       rldic, mtvsrd, xxpermdi, vslo, mfvsrd
 
-/* { dg-final { scan-assembler-times {\maddi\M} 6 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\maddi\M} 3 { target lp64 } } } */
+/* results. */
+/* { dg-final { scan-assembler-times {\mxori\M} 3 { target le } } } */
+/* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\madd\M} 3 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlwz\M} 11 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\maddi\M} 6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mmfvsrd\M} 6 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target le } } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 6 { target { be && lp64 } } } } */
 /* { dg-final { scan-assembler-times {\mxxpermdi\M} 2 { target { be && ilp32 } } } } */
-/* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 { target { be && lp64 } } } } */
-/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxvw4x\M} 4 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M|\mrldic\M|\mrlwinm\M} 3 } } */
-/* { dg-final { scan-assembler-times {\mmfvsrd\M} 3 { target { lp64 } } } } */
-/* { dg-final { scan-assembler-times {\mmfvsrd\M} 0 { target { be && ilp32 } } } } */
-/* { dg-final { scan-assembler-times {\mmtvsrd\M} 0 { target { lp64 } } } } */
-/* { dg-final { scan-assembler-times {\mmtvsrd\M} 0 { target { be && ilp32 } } } } */
+/* { dg-final { scan-assembler-times {\mvslo\M} 3 { target lp64 } } } */
 
 #include <altivec.h>

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797