Created attachment 29672 [details] Reproducer GCC can't vectorize such loop: void foo (double *a) { int i; for (i = 0; i < 100; i+=2) { a[i+1] = 2; a[i] = 3; a[i+1] = 2; a[i] = 3; } } Vectorizer reports following: note: === vect_analyze_data_ref_accesses === note: Detected interleaving of size 2 note: Two store stmts share the same dr. note: not vectorized: complicated access pattern. Obviously, in this given case vectorization is possible because the first stores have no effect. This test is a reproducer of similar problem encountered on Spec2006/470.lbm - there if-conversion could produce stores to the same location which will stop vectorizer. The test is attached, command line to reproduce: gcc group_access.c -O3 -c -ftree-vectorizer-verbose=15
> This test is a reproducer of similar problem encountered on Spec2006/470.lbm - > there if-conversion could produce stores to the same location which will stop > vectorizer. Can you reproduce a testcase for that instead? It doesn't make sense to handle code that should be optimized earlier (by DSE). Is it from code like if (cond) a[i] = 3; else a[i] = 3; ?
> Can you reproduce a testcase for that instead? It doesn't make sense > to handle code that should be optimized earlier (by DSE). Is it from > code like > > if (cond) > a[i] = 3; > else > a[i] = 3; > > ? Yes, originally it is from the code similar to your example, but this example has one more problem which hides the one described in this tracker. I've submitted one more bug with the test almost like yours (56625).
Created attachment 29674 [details] Reproducer 2
Sorry, it looks like the reproducer with if could be made, and here it is: void foo (long *a) { int i; for (i = 0; i < 100; i+=2) { if (a[i] == 0) { a[i+1] = 2; a[i] = 3; } else { a[i+1] = 3; a[i] = 4; } } } In this example we have: group_access2.c:4: note: === vect_analyze_data_ref_accesses === group_access2.c:4: note: READ_WRITE dependence in interleaving. group_access2.c:4: note: not vectorized: complicated access pattern. group_access2.c:4: note: bad data access. group_access2.c:1: note: vectorized 0 loops in function. The diagnostic is a bit different, but rootcause is the same I guess. The test is attached (reproducer 2).
Thanks and confirmed.
(In reply to Michael Zolotukhin from comment #4) > Sorry, it looks like the reproducer with if could be made, and here it is: > void foo (long *a) > { > int i; > for (i = 0; i < 100; i+=2) > { > if (a[i] == 0) > { > a[i+1] = 2; > a[i] = 3; > } > else > { > a[i+1] = 3; > a[i] = 4; > } > } > } > In this example we have: > group_access2.c:4: note: === vect_analyze_data_ref_accesses === > group_access2.c:4: note: READ_WRITE dependence in interleaving. > group_access2.c:4: note: not vectorized: complicated access pattern. > group_access2.c:4: note: bad data access. > group_access2.c:1: note: vectorized 0 loops in function. > > The diagnostic is a bit different, but rootcause is the same I guess. > > The test is attached (reproducer 2). We now vectorize this loop (not with plain SSE2 but with SSE4.2 for example): .L2: movq (%rdi), %xmm0 movdqa %xmm2, %xmm4 addq $16, %rdi punpcklqdq %xmm0, %xmm0 pcmpeqq %xmm1, %xmm0 pblendvb %xmm0, %xmm3, %xmm4 movups %xmm4, -16(%rdi) cmpq %rdi, %rax jne .L2 probably because we now sink the common stores from the if arm. Modifying the testcase to the following reproduces the original issue again: void foo (long *a) { int i; for (i = 0; i < 100; i+=2) { if (a[i] == 0) { a[i+1] = 2; a[i] = 3; } else { a[i] = 4; a[i+1] = 3; } } }
Looks like all of the testcases vectorize since GCC 11 as far as I can tell.
For aarch64, it has been since GCC 13 though for the testcase in comment #6 .