Bug 49795

Summary: vectorization of conditional code happens only on local variables
Product: gcc Reporter: vincenzo Innocente <vincenzo.innocente>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: enhancement CC: jakub, rguenth, spop
Priority: P3 Keywords: missed-optimization
Version: 4.7.0   
Target Milestone: ---   
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2011-07-20 14:53:41
Bug Depends on:    
Bug Blocks: 21462, 53947    

Description vincenzo Innocente 2011-07-20 11:45:08 UTC
in this example loop1 does not vectorize, loop2 does 
const int N=64;
float c[N];
float d[N];

void loop1() {
  for (int i=0; i!=N; ++i) {
    if (c[i]<0) d[i] = -d[i];
  }
}

void loop2() {
  for (int i=0; i!=N; ++i) {
    float tmp = d[i];
    if (c[i]<0) tmp = -tmp;
    d[i]=tmp;
  }
}
Comment 1 Jakub Jelinek 2011-07-20 11:57:56 UTC
At least from C++0x memory model or OpenMP POV that's highly desirable.
In loop1 d[i] isn't written unconditionally, in loop2 it is, so transforming loop1 code into loop2 might introduce data races.
Comment 2 Paolo Carlini 2011-07-20 12:00:32 UTC
Interesting. Then I would be curious to know what other respected compilers vs OpenMP do in this area, eg, Intel..
Comment 3 vincenzo Innocente 2011-07-20 12:32:21 UTC
my actual code looks more like this
void loop() {
  for (int i=0; i!=N; ++i) {
    d[i]=a[i]+b[i];
    if (c[i]<0) d[i] = -d[i];
  }
}
where d[i] IS written unconditionally (and does not vectorize either)
Comment 4 Jakub Jelinek 2011-07-20 12:41:07 UTC
That is something different, yeah, in that case the transformation doesn't introduce new data races and is desirable as well, not just for vectorization.
Comment 5 Richard Biener 2011-07-20 14:53:41 UTC
I think you at least need -ftree-loop-if-convert-stores to vectorize conditional stores, but it doens't seem to work in this case.
Comment 6 vincenzo Innocente 2011-07-20 16:59:20 UTC
actually  -ftree-loop-if-convert-stores does the "trick" with -Ofast

things are not fully consistent though
of these four loop I get the following
notice how the combination -ftree-loop-if-convert-stores -03 vectorize the first BUT not the second!


const int N=1024;
float __attribute__ ((aligned(16))) a[N];
float __attribute__ ((aligned(16))) b[N];
float __attribute__ ((aligned(16))) c[N];
float __attribute__ ((aligned(16))) d[N];

void loop1() {
  for (int i=0; i!=N; ++i) {
    d[i]=a[i]+b[i];
    if (c[i]<0) d[i] = -d[i];
  }
}

void loop2() {
  for (int i=0; i!=N; ++i) {
    float tmp = a[i]+b[i];
    if (c[i]<0) tmp = -tmp;
    d[i]=tmp;
  }
}

void loop3() {
  for (int i=0; i!=N; ++i) {
    d[i] = (c[i]>0) ? a[i]+b[i] : -a[i]-b[i];
  }
}

void loop4() {
  for (int i=0; i!=N; ++i) {
    float tmp = a[i]+b[i];
    tmp = (c[i]>0) ? tmp : -tmp;
    d[i] = tmp;
  }
}




c++ -Wall -O3  -ftree-vectorizer-verbose=2  -c test/testBug.cpp -o bha.o

test/testBug.cpp:9: note: vectorized 0 loops in function.

test/testBug.cpp:17: note: LOOP VECTORIZED.
test/testBug.cpp:16: note: vectorized 1 loops in function.

test/testBug.cpp:24: note: vectorized 0 loops in function.

test/testBug.cpp:31: note: not vectorized: unsupported data-type bool
test/testBug.cpp:30: note: vectorized 0 loops in function.
pb-d-128-141-131-10:Octave innocent$ c++ -Wall -O3  -ftree-vectorizer-verbose=2  -c test/testBug.cpp -o bha.o -ftree-loop-if-convert-stores

test/testBug.cpp:10: note: LOOP VECTORIZED.
test/testBug.cpp:9: note: vectorized 1 loops in function.

test/testBug.cpp:16: note: vectorized 0 loops in function.

test/testBug.cpp:24: note: vectorized 0 loops in function.

test/testBug.cpp:30: note: vectorized 0 loops in function.
pb-d-128-141-131-10:Octave innocent$ c++ -Wall -Ofast  -ftree-vectorizer-verbose=2  -c test/testBug.cpp -o bha.o -ftree-loop-if-convert-stores

test/testBug.cpp:10: note: LOOP VECTORIZED.
test/testBug.cpp:9: note: vectorized 1 loops in function.

test/testBug.cpp:17: note: LOOP VECTORIZED.
test/testBug.cpp:16: note: vectorized 1 loops in function.

test/testBug.cpp:25: note: LOOP VECTORIZED.
test/testBug.cpp:24: note: vectorized 1 loops in function.

test/testBug.cpp:31: note: LOOP VECTORIZED.
test/testBug.cpp:30: note: vectorized 1 loops in function.
Comment 7 Andrew Pinski 2023-08-24 07:44:53 UTC
(In reply to vincenzo Innocente from comment #6)

loop1, loop2, and loop4 all vectorize now at -O3.
loop3 can vectorize with -O3 -fno-trapping-math (it can also be vectorize at -O3 on x86_64 with -march=skylake-avx512 or on aarch64 with SVE enabled).


I almost want to say there is not much to do here.
also about comment #0, the vectorizer does happen for -mavx2. So maybe this is enough these days ...