Created attachment 59138 [details] A dump file of vect pass with details enabled Hello this report is about a loop which GCC doesn't vectorize. This is the loop in question: void foo(int *in, int *out1, int *out2, int n) { for (int i = 0; i < n; i+=2) { out1[i/2] = in[i]; out2[i/2] = in[i+1]; } } GCC isn't able to vectorize this example: ARM: https://godbolt.org/z/Eq7vs7ssx RISC-V: https://godbolt.org/z/Y1sr4cr8K Clang vectorizes this loop for ARM as well as RISC-V: ARM: https://godbolt.org/z/YvaMqTz1h RISC-V: https://godbolt.org/z/x1MxGc7Gx In the vect optimization pass there are two misses regarding array indexing: analyze_innermost: t.c:6:19: missed: failed: evolution of base is not affine. analyze_innermost: t.c:7:19: missed: failed: evolution of base is not affine. There are also these misses: t.c:6:23: missed: possible alias involving gather/scatter between *_3 and *_7 t.c:4:23: missed: bad data dependence. t.c:6:23: missed: possible alias involving gather/scatter between *_3 and *_7 t.c:4:23: missed: bad data dependence. t.c:6:23: missed: possible alias involving gather/scatter between *_3 and *_7 t.c:4:23: missed: bad data dependence. t.c:6:23: missed: possible alias involving gather/scatter between *_3 and *_7 t.c:4:23: missed: bad data dependence. t.c:6:23: missed: possible alias involving gather/scatter between *_3 and *_7 t.c:4:23: missed: bad data dependence. t.c:6:23: missed: possible alias involving gather/scatter between *_3 and *_7 t.c:4:23: missed: bad data dependence. t.c:6:23: missed: possible alias involving gather/scatter between *_3 and *_7 t.c:4:23: missed: bad data dependence. t.c:6:23: missed: possible alias involving gather/scatter between *_3 and *_7 t.c:4:23: missed: bad data dependence. t.c:6:23: missed: possible alias involving gather/scatter between *_3 and *_7 t.c:4:23: missed: bad data dependence. t.c:4:23: missed: couldn't vectorize loop void foo (int * in, int * out1, int * out2, int n) { int i; long unsigned int _1; long unsigned int _2; int * _3; int _4; long unsigned int _5; long unsigned int _6; int * _7; int _8; sizetype _9; sizetype _10; int * _11; int * _12; int _13; <bb 2> [local count: 118111600]: if (n_17(D) > 0) goto <bb 5>; [89.00%] else goto <bb 4>; [11.00%] <bb 5> [local count: 105119324]: <bb 3> [local count: 955630224]: # i_26 = PHI <i_23(6), 0(5)> _1 = (long unsigned int) i_26; _2 = _1 * 4; _3 = in_18(D) + _2; _4 = i_26 >> 1; _5 = (long unsigned int) _4; _6 = _5 * 4; _7 = out1_19(D) + _6; _8 = *_3; *_7 = _8; _9 = _1 + 1; _10 = _9 * 4; _11 = in_18(D) + _10; _12 = out2_21(D) + _6; _13 = *_11; *_12 = _13; i_23 = i_26 + 2; if (n_17(D) > i_23) goto <bb 6>; [89.00%] else goto <bb 7>; [11.00%] <bb 6> [local count: 850510900]: goto <bb 3>; [100.00%] <bb 7> [local count: 105119324]: <bb 4> [local count: 118111600]: return; }
I thought I had saw this one before ...
This isn't about gather/scatter but SCEV analysis failing to handle i/2 (i >> 1 in the IL) as IV. That's only affine because i is even so it's an exact division. We might want to "fold" i / 2 to i /[ex] 2 somewhere in VRP to help, but possibly somehow SCEV could "optimistically" analyze the shift and resolve it after it finds the initial value and increment. Bottom line - rewrite your code to for (int i = 0; i < n/2; i++) { out1[i] = in[2*i]; out2[i] = in[2*i+1]; }
Created attachment 59144 [details] hack This works for the special case of >> 1; something more general and nicer written could work. CHREC_LEFT might not need to be a multiple I think and thus not constant. EXACT_DIV_EXPR could be allowed directly in a CHREC_RIGHT, even if we can't constant-fold it, so I think working towards that would be better and instead handle EXACT_DIV_EXPR in SCEV.