Bug 49795 - vectorization of conditional code happens only on local variables
Summary: vectorization of conditional code happens only on local variables
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: 21462 vectorizer
  Show dependency treegraph
 
Reported: 2011-07-20 11:45 UTC by vincenzo Innocente
Modified: 2023-08-24 07:44 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2011-07-20 14:53:41


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description vincenzo Innocente 2011-07-20 11:45:08 UTC
in this example loop1 does not vectorize, loop2 does 
const int N=64;
float c[N];
float d[N];

void loop1() {
  for (int i=0; i!=N; ++i) {
    if (c[i]<0) d[i] = -d[i];
  }
}

void loop2() {
  for (int i=0; i!=N; ++i) {
    float tmp = d[i];
    if (c[i]<0) tmp = -tmp;
    d[i]=tmp;
  }
}
Comment 1 Jakub Jelinek 2011-07-20 11:57:56 UTC
At least from C++0x memory model or OpenMP POV that's highly desirable.
In loop1 d[i] isn't written unconditionally, in loop2 it is, so transforming loop1 code into loop2 might introduce data races.
Comment 2 Paolo Carlini 2011-07-20 12:00:32 UTC
Interesting. Then I would be curious to know what other respected compilers vs OpenMP do in this area, eg, Intel..
Comment 3 vincenzo Innocente 2011-07-20 12:32:21 UTC
my actual code looks more like this
void loop() {
  for (int i=0; i!=N; ++i) {
    d[i]=a[i]+b[i];
    if (c[i]<0) d[i] = -d[i];
  }
}
where d[i] IS written unconditionally (and does not vectorize either)
Comment 4 Jakub Jelinek 2011-07-20 12:41:07 UTC
That is something different, yeah, in that case the transformation doesn't introduce new data races and is desirable as well, not just for vectorization.
Comment 5 Richard Biener 2011-07-20 14:53:41 UTC
I think you at least need -ftree-loop-if-convert-stores to vectorize conditional stores, but it doens't seem to work in this case.
Comment 6 vincenzo Innocente 2011-07-20 16:59:20 UTC
actually  -ftree-loop-if-convert-stores does the "trick" with -Ofast

things are not fully consistent though
of these four loop I get the following
notice how the combination -ftree-loop-if-convert-stores -03 vectorize the first BUT not the second!


const int N=1024;
float __attribute__ ((aligned(16))) a[N];
float __attribute__ ((aligned(16))) b[N];
float __attribute__ ((aligned(16))) c[N];
float __attribute__ ((aligned(16))) d[N];

void loop1() {
  for (int i=0; i!=N; ++i) {
    d[i]=a[i]+b[i];
    if (c[i]<0) d[i] = -d[i];
  }
}

void loop2() {
  for (int i=0; i!=N; ++i) {
    float tmp = a[i]+b[i];
    if (c[i]<0) tmp = -tmp;
    d[i]=tmp;
  }
}

void loop3() {
  for (int i=0; i!=N; ++i) {
    d[i] = (c[i]>0) ? a[i]+b[i] : -a[i]-b[i];
  }
}

void loop4() {
  for (int i=0; i!=N; ++i) {
    float tmp = a[i]+b[i];
    tmp = (c[i]>0) ? tmp : -tmp;
    d[i] = tmp;
  }
}




c++ -Wall -O3  -ftree-vectorizer-verbose=2  -c test/testBug.cpp -o bha.o

test/testBug.cpp:9: note: vectorized 0 loops in function.

test/testBug.cpp:17: note: LOOP VECTORIZED.
test/testBug.cpp:16: note: vectorized 1 loops in function.

test/testBug.cpp:24: note: vectorized 0 loops in function.

test/testBug.cpp:31: note: not vectorized: unsupported data-type bool
test/testBug.cpp:30: note: vectorized 0 loops in function.
pb-d-128-141-131-10:Octave innocent$ c++ -Wall -O3  -ftree-vectorizer-verbose=2  -c test/testBug.cpp -o bha.o -ftree-loop-if-convert-stores

test/testBug.cpp:10: note: LOOP VECTORIZED.
test/testBug.cpp:9: note: vectorized 1 loops in function.

test/testBug.cpp:16: note: vectorized 0 loops in function.

test/testBug.cpp:24: note: vectorized 0 loops in function.

test/testBug.cpp:30: note: vectorized 0 loops in function.
pb-d-128-141-131-10:Octave innocent$ c++ -Wall -Ofast  -ftree-vectorizer-verbose=2  -c test/testBug.cpp -o bha.o -ftree-loop-if-convert-stores

test/testBug.cpp:10: note: LOOP VECTORIZED.
test/testBug.cpp:9: note: vectorized 1 loops in function.

test/testBug.cpp:17: note: LOOP VECTORIZED.
test/testBug.cpp:16: note: vectorized 1 loops in function.

test/testBug.cpp:25: note: LOOP VECTORIZED.
test/testBug.cpp:24: note: vectorized 1 loops in function.

test/testBug.cpp:31: note: LOOP VECTORIZED.
test/testBug.cpp:30: note: vectorized 1 loops in function.
Comment 7 Andrew Pinski 2023-08-24 07:44:53 UTC
(In reply to vincenzo Innocente from comment #6)

loop1, loop2, and loop4 all vectorize now at -O3.
loop3 can vectorize with -O3 -fno-trapping-math (it can also be vectorize at -O3 on x86_64 with -march=skylake-avx512 or on aarch64 with SVE enabled).


I almost want to say there is not much to do here.
also about comment #0, the vectorizer does happen for -mavx2. So maybe this is enough these days ...