in this example loop1 does not vectorize, loop2 does const int N=64; float c[N]; float d[N]; void loop1() { for (int i=0; i!=N; ++i) { if (c[i]<0) d[i] = -d[i]; } } void loop2() { for (int i=0; i!=N; ++i) { float tmp = d[i]; if (c[i]<0) tmp = -tmp; d[i]=tmp; } }
At least from C++0x memory model or OpenMP POV that's highly desirable. In loop1 d[i] isn't written unconditionally, in loop2 it is, so transforming loop1 code into loop2 might introduce data races.
Interesting. Then I would be curious to know what other respected compilers vs OpenMP do in this area, eg, Intel..
my actual code looks more like this void loop() { for (int i=0; i!=N; ++i) { d[i]=a[i]+b[i]; if (c[i]<0) d[i] = -d[i]; } } where d[i] IS written unconditionally (and does not vectorize either)
That is something different, yeah, in that case the transformation doesn't introduce new data races and is desirable as well, not just for vectorization.
I think you at least need -ftree-loop-if-convert-stores to vectorize conditional stores, but it doens't seem to work in this case.
actually -ftree-loop-if-convert-stores does the "trick" with -Ofast things are not fully consistent though of these four loop I get the following notice how the combination -ftree-loop-if-convert-stores -03 vectorize the first BUT not the second! const int N=1024; float __attribute__ ((aligned(16))) a[N]; float __attribute__ ((aligned(16))) b[N]; float __attribute__ ((aligned(16))) c[N]; float __attribute__ ((aligned(16))) d[N]; void loop1() { for (int i=0; i!=N; ++i) { d[i]=a[i]+b[i]; if (c[i]<0) d[i] = -d[i]; } } void loop2() { for (int i=0; i!=N; ++i) { float tmp = a[i]+b[i]; if (c[i]<0) tmp = -tmp; d[i]=tmp; } } void loop3() { for (int i=0; i!=N; ++i) { d[i] = (c[i]>0) ? a[i]+b[i] : -a[i]-b[i]; } } void loop4() { for (int i=0; i!=N; ++i) { float tmp = a[i]+b[i]; tmp = (c[i]>0) ? tmp : -tmp; d[i] = tmp; } } c++ -Wall -O3 -ftree-vectorizer-verbose=2 -c test/testBug.cpp -o bha.o test/testBug.cpp:9: note: vectorized 0 loops in function. test/testBug.cpp:17: note: LOOP VECTORIZED. test/testBug.cpp:16: note: vectorized 1 loops in function. test/testBug.cpp:24: note: vectorized 0 loops in function. test/testBug.cpp:31: note: not vectorized: unsupported data-type bool test/testBug.cpp:30: note: vectorized 0 loops in function. pb-d-128-141-131-10:Octave innocent$ c++ -Wall -O3 -ftree-vectorizer-verbose=2 -c test/testBug.cpp -o bha.o -ftree-loop-if-convert-stores test/testBug.cpp:10: note: LOOP VECTORIZED. test/testBug.cpp:9: note: vectorized 1 loops in function. test/testBug.cpp:16: note: vectorized 0 loops in function. test/testBug.cpp:24: note: vectorized 0 loops in function. test/testBug.cpp:30: note: vectorized 0 loops in function. pb-d-128-141-131-10:Octave innocent$ c++ -Wall -Ofast -ftree-vectorizer-verbose=2 -c test/testBug.cpp -o bha.o -ftree-loop-if-convert-stores test/testBug.cpp:10: note: LOOP VECTORIZED. test/testBug.cpp:9: note: vectorized 1 loops in function. test/testBug.cpp:17: note: LOOP VECTORIZED. test/testBug.cpp:16: note: vectorized 1 loops in function. test/testBug.cpp:25: note: LOOP VECTORIZED. test/testBug.cpp:24: note: vectorized 1 loops in function. test/testBug.cpp:31: note: LOOP VECTORIZED. test/testBug.cpp:30: note: vectorized 1 loops in function.
(In reply to vincenzo Innocente from comment #6) loop1, loop2, and loop4 all vectorize now at -O3. loop3 can vectorize with -O3 -fno-trapping-math (it can also be vectorize at -O3 on x86_64 with -march=skylake-avx512 or on aarch64 with SVE enabled). I almost want to say there is not much to do here. also about comment #0, the vectorizer does happen for -mavx2. So maybe this is enough these days ...