The following testcase is miscompiled on x86_64-linux, e.g. with -O3 -mavx2 or -O3 -mssse3 since r238039: #define N 16 struct C { int r, i; }; struct C a[N], b[N], c[N], d[N], e[N]; __attribute__((noipa)) static void foo (struct C *__restrict x, struct C *__restrict y, struct C *__restrict z, int w) { int i; for (int i = 0; i < w; i++) { z[i].r = x[i].r * y[-1 - i].r - x[i].i * y[-1 - i].i; z[i].i = x[i].i * y[-1 - i].r + x[i].r * y[-1 - i].i; } } __attribute__((noipa)) static void bar (struct C *__restrict x, struct C *__restrict y, struct C *__restrict z, int w) { int i; for (int i = 0; i < w; i++) { z[i].r = x[i].r * y[i].r - x[i].i * y[i].i; z[i].i = x[i].i * y[i].r + x[i].r * y[i].i; } } int main () { int i; for (i = 0; i < N; ++i) { a[i].r = N - i; a[i].i = i - N; b[i].r = i - N; b[i].i = i + N; c[i].r = -1 - i; c[i].i = 2 * N - 1 - i; } foo (a, b + N, d, N); bar (a, c, e, N); for (i = 0; i < N; ++i) if (d[i].r != e[i].r || d[i].i != e[i].i) __builtin_abort (); return 0; } In bar which looks correct it is: vect__6.87_69 = MEM[base: y_21(D), index: ivtmp.133_9, offset: 0B]; vect__6.88_70 = VEC_PERM_EXPR <vect__6.87_69, vect__6.87_69, { 0, 0, 2, 2, 4, 4, 6, 6 }>; vect__4.84_66 = MEM[base: x_20(D), index: ivtmp.133_9, offset: 0B]; vect__4.93_75 = VEC_PERM_EXPR <vect__4.84_66, vect__4.84_66, { 1, 0, 3, 2, 5, 4, 7, 6 }>; vect__6.97_79 = VEC_PERM_EXPR <vect__6.87_69, vect__6.87_69, { 1, 1, 3, 3, 5, 5, 7, 7 }>; vect__7.89_71 = vect__4.84_66 * vect__6.88_70; vect__10.98_80 = vect__4.93_75 * vect__6.97_79; vect__12.99_81 = vect__7.89_71 - vect__10.98_80; vect__12.100_82 = vect__7.89_71 + vect__10.98_80; _83 = VEC_PERM_EXPR <vect__12.99_81, vect__12.100_82, { 0, 9, 2, 11, 4, 13, 6, 15 }>; MEM[base: z_22(D), index: ivtmp.133_9, offset: 0B] = _83; foo has the y pointer iterating with -8 step rather than 8, so I'd expect the x related permutations to stay and for y to start with y_21(D) - 32B and use { 7, 7, 5, 5, 3, 3, 1, 1 } and { 6, 6, 4, 4, 2, 2, 0, 0 } permutations, but we actually emit instead: _34 = (void *) ivtmp.64_69; vect__9.16_80 = MEM[base: _34, offset: 0B]; vect__9.17_81 = VEC_PERM_EXPR <vect__9.16_80, vect__9.16_80, { 0, 0, 2, 2, 4, 4, 6, 6 }>; vect__4.13_76 = MEM[base: x_23(D), index: ivtmp.62_71, offset: 0B]; vect__4.22_86 = VEC_PERM_EXPR <vect__4.13_76, vect__4.13_76, { 1, 0, 3, 2, 5, 4, 7, 6 }>; vect__12.25_90 = MEM[base: _34, offset: 4B]; vect__12.26_91 = VEC_PERM_EXPR <vect__12.25_90, vect__12.25_90, { 0, 0, 2, 2, 4, 4, 6, 6 }>; vect__10.18_82 = vect__4.13_76 * vect__9.17_81; vect__13.27_92 = vect__4.22_86 * vect__12.26_91; vect__15.28_93 = vect__10.18_82 - vect__13.27_92; vect__15.29_94 = vect__10.18_82 + vect__13.27_92; _95 = VEC_PERM_EXPR <vect__15.28_93, vect__15.29_94, { 0, 9, 2, 11, 4, 13, 6, 15 }>; MEM[base: z_25(D), index: ivtmp.62_71, offset: 0B] = _95; where ivtmp.64_69 starts at y_21(D) - 8 (!) and with step -32.
Mine
Author: rsandifo Date: Mon Nov 11 19:43:52 2019 New Revision: 278064 URL: https://gcc.gnu.org/viewcvs?rev=278064&root=gcc&view=rev Log: Fix SLP downward group access classification (PR92420) This PR was caused by the SLP handling in get_group_load_store_type returning VMAT_CONTIGUOUS rather than VMAT_CONTIGUOUS_REVERSE for downward groups. A more elaborate fix would be to try to combine the reverse permutation into SLP_TREE_LOAD_PERMUTATION for loads, but that's really a follow-on optimisation and not backport material. It might also not necessarily be a win, if the target supports (say) reversing and odd/even swaps as independent permutes but doesn't recognise the combined form. 2019-11-11 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR tree-optimization/92420 * tree-vect-stmts.c (get_negative_load_store_type): Move further up file. (get_group_load_store_type): Use it for reversed SLP accesses. gcc/testsuite/ * gcc.dg/vect/pr92420.c: New test. Added: trunk/gcc/testsuite/gcc.dg/vect/pr92420.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-stmts.c
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
The releases/gcc-9 branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>: https://gcc.gnu.org/g:2d8ea3a0a6095a56b7c59c50b1068d602cde934a commit r9-8248-g2d8ea3a0a6095a56b7c59c50b1068d602cde934a Author: Richard Sandiford <richard.sandiford@arm.com> Date: Mon Nov 11 19:43:52 2019 +0000 Fix SLP downward group access classification [PR92420] This PR was caused by the SLP handling in get_group_load_store_type returning VMAT_CONTIGUOUS rather than VMAT_CONTIGUOUS_REVERSE for downward groups. A more elaborate fix would be to try to combine the reverse permutation into SLP_TREE_LOAD_PERMUTATION for loads, but that's really a follow-on optimisation and not backport material. It might also not necessarily be a win, if the target supports (say) reversing and odd/even swaps as independent permutes but doesn't recognise the combined form. 2020-02-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ Backport from mainline 2019-11-11 Richard Sandiford <richard.sandiford@arm.com> PR tree-optimization/92420 * tree-vect-stmts.c (get_negative_load_store_type): Move further up file. (get_group_load_store_type): Use it for reversed SLP accesses. gcc/testsuite/ PR tree-optimization/92420 * gcc.dg/vect/pr92420.c: New test.
The releases/gcc-8 branch has been updated by Richard Sandiford <rsandifo@gcc.gnu.org>: https://gcc.gnu.org/g:785eda9390473e42f0e0b7199c42032a0432de68 commit r8-10057-g785eda9390473e42f0e0b7199c42032a0432de68 Author: Richard Sandiford <richard.sandiford@arm.com> Date: Mon Nov 11 19:43:52 2019 +0000 Fix SLP downward group access classification [PR92420] This PR was caused by the SLP handling in get_group_load_store_type returning VMAT_CONTIGUOUS rather than VMAT_CONTIGUOUS_REVERSE for downward groups. A more elaborate fix would be to try to combine the reverse permutation into SLP_TREE_LOAD_PERMUTATION for loads, but that's really a follow-on optimisation and not backport material. It might also not necessarily be a win, if the target supports (say) reversing and odd/even swaps as independent permutes but doesn't recognise the combined form. 2020-02-25 Richard Sandiford <richard.sandiford@arm.com> gcc/ Backport from mainline 2019-11-11 Richard Sandiford <richard.sandiford@arm.com> PR tree-optimization/92420 * tree-vect-stmts.c (get_negative_load_store_type): Move further up file. (get_group_load_store_type): Use it for reversed SLP accesses. gcc/testsuite/ PR tree-optimization/92420 * gcc.dg/vect/pr92420.c: New test.
Fixed on trunk and branches.