This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
[PATCH, rs6000] Fix PR72863 (swap optimization misses swaps generated from intrinsics)
- From: Bill Schmidt <wschmidt at linux dot vnet dot ibm dot com>
- To: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: Segher Boessenkool <segher at kernel dot crashing dot org>, David Edelsohn <dje dot gcc at gmail dot com>, anton at samba dot org
- Date: Thu, 11 Aug 2016 13:39:04 -0500
- Subject: [PATCH, rs6000] Fix PR72863 (swap optimization misses swaps generated from intrinsics)
- Authentication-results: sourceware.org; auth=none
Hi,
Anton reports in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72863 that use of
vec_vsx_ld and vec_vsx_st intrinsics leaves the endian swaps in the generated
code, even for very simple computations. This turns out to be because we don't
generate the swaps at expand time as we do with other vector moves; rather, they
don't get generated until split time. This patch fixes the problem in the
obvious way.
Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
One new test case added. Is this ok for trunk?
I would also like to backport this to 6 and 5 branches after some burn-in time.
I do not plan to rush this into 6.2; we'll have to wait for 6.3 as this is only
a performance issue, albeit an important one.
Thanks,
Bill
[gcc]
2016-08-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
PR target/72863
* vsx.md (vsx_load_<mode>): For P8LE, emit swaps at expand time.
(vsx_store_<mode>): Likewise.
[gcc/testsuite]
2016-08-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
PR target/72863
* gcc.target/powerpc/pr72863.c: New test.
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md (revision 239310)
+++ gcc/config/rs6000/vsx.md (working copy)
@@ -922,13 +922,27 @@
[(set (match_operand:VSX_M 0 "vsx_register_operand" "")
(match_operand:VSX_M 1 "memory_operand" ""))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
- "")
+{
+ /* Expand to swaps if needed, prior to swap optimization. */
+ if (!BYTES_BIG_ENDIAN && !TARGET_P9_VECTOR)
+ {
+ rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode);
+ DONE;
+ }
+})
(define_expand "vsx_store_<mode>"
[(set (match_operand:VSX_M 0 "memory_operand" "")
(match_operand:VSX_M 1 "vsx_register_operand" ""))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
- "")
+{
+ /* Expand to swaps if needed, prior to swap optimization. */
+ if (!BYTES_BIG_ENDIAN && !TARGET_P9_VECTOR)
+ {
+ rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode);
+ DONE;
+ }
+})
;; Explicit load/store expanders for the builtin functions for lxvd2x, etc.,
;; when you really want their element-reversing behavior.
Index: gcc/testsuite/gcc.target/powerpc/pr72863.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr72863.c (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr72863.c (working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O3" } */
+/* { dg-final { scan-assembler "lxvd2x" } } */
+/* { dg-final { scan-assembler "stxvd2x" } } */
+/* { dg-final { scan-assembler-not "xxpermdi" } } */
+
+#include <altivec.h>
+
+extern unsigned char *src, *dst;
+
+void b(void)
+{
+ int i;
+
+ unsigned char *s8 = src;
+ unsigned char *d8 = dst;
+
+ for (i = 0; i < 100; i++) {
+ vector unsigned char vs = vec_vsx_ld(0, s8);
+ vector unsigned char vd = vec_vsx_ld(0, d8);
+ vector unsigned char vr = vec_xor(vs, vd);
+ vec_vsx_st(vr, 0, d8);
+ s8 += 16;
+ d8 += 16;
+ }
+}