The following two procedures are functionally equivalent, but the first (more complicated) syntax is vectorized though the second isn't. typedef int __attribute ((aligned (16))) aint; void test(aint * __restrict a1, int const v1, int const v2) { for (int i=0; i<640; ++i) a1[i] = (a1[i] == v1 ? v2 : a1[i]); } void test2(aint * __restrict a1, int const v1, int const v2) { for (int i=0; i<640; ++i) if (a1[i] == v1) a1[i] = v2; } vecttest.cpp:7: note: === vect_analyze_loop_form === vecttest.cpp:7: note: not vectorized: too many BBs in loop. vecttest.cpp:6: note: bad loop form. vecttest.cpp:6: note: vectorized 0 loops in function. Using built-in specs. Target: i686-pc-linux-gnu Configured with: /esat/alexandria1/sderoeck/src/gcc/main/configure --prefix=/esat/olympia/install --program-suffix=-cvs --enable-languages=c,c++ : (reconfigured) /esat/alexandria1/sderoeck/src/gcc/main/configure --prefix=/esat/olympia/install --program-suffix=-cvs --enable-languages=c,c++ : (reconfigured) /esat/alexandria1/sderoeck/src/gcc/main/configure --prefix=/esat/olympia/install --program-suffix=-cvs --enable-languages=c,c++ --no-create --no-recursion : (reconfigured) /esat/alexandria1/sderoeck/src/gcc/main/configure --prefix=/esat/olympia/install --program-suffix=-cvs --enable-languages=c,c++ --no-create --no-recursion Thread model: posix gcc version 4.1.0 20050610 (experimental) /esat/olympia/install/libexec/gcc/i686-pc-linux-gnu/4.1.0/cc1plus -quiet -v -D_GNU_SOURCE vecttest.cpp -quiet -dumpbase vecttest.cpp -march=pentium4 -auxbase-strip vecttest-gcc.s -O9 -version -fverbose-asm -ftree-vectorize -fdump-tree-vect-details -fdump-tree-vect-stats -o vecttest-gcc.s -- cut -- GNU C++ version 4.1.0 20050610 (experimental) (i686-pc-linux-gnu) compiled by GNU C version 3.4.4 (Gentoo 3.4.4, ssp-3.4.4-1.0, pie-8.7.8). GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: 3cb76b13917ca148a15d77c9a1fb678d
They are not equivalent to GCC, the first always stores, the second has a conditional store.
Confirmed.
Link to vectorizer missed-optimization meta-bug.
(In reply to comment #1) > They are not equivalent to GCC, the first always stores, the second has a > conditional store. Just to clarify, 7 years later: To GCC the two procedures are not equivalent. In the first procedure, a1[i] = (a1[i] == v1 ? v2 : a1[i]); expands as: if (a1[i] == v1) a1[i] = v2; else a1[i] = a1[i]; while the second procedure expands just as-is: if (a1[i] == v1) a1[i] = v2; In the first case, there will always be a store to a1[i], in the second example this is not the case. Introducing new stores is not allowed, to avoid introducing data races, see http://gcc.gnu.org/wiki/Atomic/GCCMM/DataRaces. I'm not sure how GCC should transform the second procedure to allow the loop to be vectorized.
We have two related flags here, -ftree-loop-if-convert-stores, and --param allow-store-data-races. We can adjust the former to honor the latter if specified and then eventually vectorize this, too.
Note that the concern is also that a1 may be mapped to a read-only segment, so introducing a store data-race may trap. That's probably out of the C99 language standards scope, but the middle-end has to care about this possibility.
We can vectorize test2 using mask stores ....