[PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available

Tue Jun 21 20:15:00 GMT 2016

Hi,

I discovered recently that, with -mcpu=power9, an attempt to generate a vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  This is semantically correct but the extra instruction is not optimal.  I found that there was some logic in xxspltib_constant_p to do special casing for const_vector with small constants, but not for vec_duplicate with small constants.  This patch duplicates that logic so we can generate the single instruction when possible.

When I did this, I ran into a problem with an existing test case.  We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern.  The constraints don't match for constant input.  To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired.  This corrected the failing test and produces the expected code.

I've added a test case to demonstrate the code works properly now in the usual case.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  OK for trunk, and for 6.2 after suitable burn-in?

Thanks!

Bill


[gcc]

2016-06-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (xxspltib_constant_p): Prefer vspltisw/h
	for vec_duplicate when this is cheaper.
	* config/rs6000/vsx.md (*vsx_splat_v4si_altivec): New define_insn.

[gcc/testsuite]

2016-06-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/splat-p9-1.c: New test.


Index: gcc/config/rs6000/rs6000.c
===================================================================

--- gcc/config/rs6000/rs6000.c	(revision 237619)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -6329,6 +6329,13 @@ xxspltib_constant_p (rtx op,
       value = INTVAL (element);
       if (!IN_RANGE (value, -128, 127))
 	return false;
+
+      /* See if we could generate vspltisw/vspltish directly instead of
+	 xxspltib + sign extend.  Special case 0/-1 to allow getting
+         any VSX register instead of an Altivec register.  */
+      if (!IN_RANGE (value, -1, 0) && EASY_VECTOR_15 (value)
+	  && (mode == V4SImode || mode == V8HImode))
+	return false;
     }
 
   /* Handle (const_vector [...]).  */
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 237619)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -2400,6 +2400,17 @@
     operands[1] = force_reg (<VS_scalar>mode, operands[1]);
 })
 
+;; The pattern following this one hides altivec_vspltisw, which we
+;; prefer to match when possible, so duplicate that here for
+;; TARGET_P9_VECTOR.
+(define_insn "*vsx_splat_v4si_altivec"
+  [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
+        (vec_duplicate:V4SI
+	 (match_operand:QI 1 "s5bit_cint_operand" "i")))]
+  "TARGET_P9_VECTOR"
+  "vspltisw %0,%1"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "*vsx_splat_v4si_internal"
   [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
 	(vec_duplicate:V4SI
Index: gcc/testsuite/gcc.target/powerpc/splat-p9-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/splat-p9-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/splat-p9-1.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-maltivec -mcpu=power9" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-final { scan-assembler "vspltish" } } */
+/* { dg-final { scan-assembler-not "xxspltib" } } */
+
+/* Make sure we don't use an inefficient sequence for small integer splat.  */
+
+#include <altivec.h>
+
+vector short
+foo ()
+{
+  return vec_splat_s16 (5);
+}