[PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available
Bill Schmidt
wschmidt@linux.vnet.ibm.com
Tue Jun 21 20:15:00 GMT 2016
Hi,
I discovered recently that, with -mcpu=power9, an attempt to generate a vspltish instruction resulted instead in an xxspltib followed by a vupkhsb. This is semantically correct but the extra instruction is not optimal. I found that there was some logic in xxspltib_constant_p to do special casing for const_vector with small constants, but not for vec_duplicate with small constants. This patch duplicates that logic so we can generate the single instruction when possible.
When I did this, I ran into a problem with an existing test case. We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern. The constraints don't match for constant input. To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired. This corrected the failing test and produces the expected code.
I've added a test case to demonstrate the code works properly now in the usual case.
Bootstrapped and tested on powerpc64le-unknown-linux-gnu. OK for trunk, and for 6.2 after suitable burn-in?
Thanks!
Bill
[gcc]
2016-06-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (xxspltib_constant_p): Prefer vspltisw/h
for vec_duplicate when this is cheaper.
* config/rs6000/vsx.md (*vsx_splat_v4si_altivec): New define_insn.
[gcc/testsuite]
2016-06-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* gcc.target/powerpc/splat-p9-1.c: New test.
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (revision 237619)
+++ gcc/config/rs6000/rs6000.c (working copy)
@@ -6329,6 +6329,13 @@ xxspltib_constant_p (rtx op,
value = INTVAL (element);
if (!IN_RANGE (value, -128, 127))
return false;
+
+ /* See if we could generate vspltisw/vspltish directly instead of
+ xxspltib + sign extend. Special case 0/-1 to allow getting
+ any VSX register instead of an Altivec register. */
+ if (!IN_RANGE (value, -1, 0) && EASY_VECTOR_15 (value)
+ && (mode == V4SImode || mode == V8HImode))
+ return false;
}
/* Handle (const_vector [...]). */
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md (revision 237619)
+++ gcc/config/rs6000/vsx.md (working copy)
@@ -2400,6 +2400,17 @@
operands[1] = force_reg (<VS_scalar>mode, operands[1]);
})
+;; The pattern following this one hides altivec_vspltisw, which we
+;; prefer to match when possible, so duplicate that here for
+;; TARGET_P9_VECTOR.
+(define_insn "*vsx_splat_v4si_altivec"
+ [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
+ (vec_duplicate:V4SI
+ (match_operand:QI 1 "s5bit_cint_operand" "i")))]
+ "TARGET_P9_VECTOR"
+ "vspltisw %0,%1"
+ [(set_attr "type" "vecperm")])
+
(define_insn "*vsx_splat_v4si_internal"
[(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
(vec_duplicate:V4SI
Index: gcc/testsuite/gcc.target/powerpc/splat-p9-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/splat-p9-1.c (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/splat-p9-1.c (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-maltivec -mcpu=power9" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-final { scan-assembler "vspltish" } } */
+/* { dg-final { scan-assembler-not "xxspltib" } } */
+
+/* Make sure we don't use an inefficient sequence for small integer splat. */
+
+#include <altivec.h>
+
+vector short
+foo ()
+{
+ return vec_splat_s16 (5);
+}
More information about the Gcc-patches
mailing list