This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH, rs6000] Fix PR72863 (swap optimization misses swaps generated from intrinsics)


Hi,

Anton reports in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72863 that use of
vec_vsx_ld and vec_vsx_st intrinsics leaves the endian swaps in the generated
code, even for very simple computations.  This turns out to be because we don't
generate the swaps at expand time as we do with other vector moves; rather, they
don't get generated until split time.  This patch fixes the problem in the
obvious way.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
One new test case added.  Is this ok for trunk?

I would also like to backport this to 6 and 5 branches after some burn-in time.
I do not plan to rush this into 6.2; we'll have to wait for 6.3 as this is only
a performance issue, albeit an important one.

Thanks,
Bill


[gcc]

2016-08-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/72863
	* vsx.md (vsx_load_<mode>): For P8LE, emit swaps at expand time.
	(vsx_store_<mode>): Likewise.

[gcc/testsuite]

2016-08-11  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/72863
	* gcc.target/powerpc/pr72863.c: New test.


Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 239310)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -922,13 +922,27 @@
   [(set (match_operand:VSX_M 0 "vsx_register_operand" "")
 	(match_operand:VSX_M 1 "memory_operand" ""))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
-  "")
+{
+  /* Expand to swaps if needed, prior to swap optimization.  */
+  if (!BYTES_BIG_ENDIAN && !TARGET_P9_VECTOR)
+    {
+      rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode);
+      DONE;
+    }
+})
 
 (define_expand "vsx_store_<mode>"
   [(set (match_operand:VSX_M 0 "memory_operand" "")
 	(match_operand:VSX_M 1 "vsx_register_operand" ""))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
-  "")
+{
+  /* Expand to swaps if needed, prior to swap optimization.  */
+  if (!BYTES_BIG_ENDIAN && !TARGET_P9_VECTOR)
+    {
+      rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode);
+      DONE;
+    }
+})
 
 ;; Explicit load/store expanders for the builtin functions for lxvd2x, etc.,
 ;; when you really want their element-reversing behavior.
Index: gcc/testsuite/gcc.target/powerpc/pr72863.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr72863.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr72863.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O3" } */
+/* { dg-final { scan-assembler "lxvd2x" } } */
+/* { dg-final { scan-assembler "stxvd2x" } } */
+/* { dg-final { scan-assembler-not "xxpermdi" } } */
+
+#include <altivec.h>
+
+extern unsigned char *src, *dst;
+
+void b(void)
+{
+  int i;
+
+  unsigned char *s8 = src;
+  unsigned char *d8 = dst;
+
+  for (i = 0; i < 100; i++) {
+    vector unsigned char vs = vec_vsx_ld(0, s8);
+    vector unsigned char vd = vec_vsx_ld(0, d8);
+    vector unsigned char vr = vec_xor(vs, vd);
+    vec_vsx_st(vr, 0, d8);
+    s8 += 16;
+    d8 += 16;
+  }
+}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]