Created attachment 38187 [details] Reproducer. Test case produces incorrect result with -O3 -mavx2 (and -march=knl and -march=skylake-avx512. Everything works fine with gcc version 4.9.4 20160401 (prerelease) (Revision=234686) and gcc version 5.3.1 20160401 (Revision=234686). Output: g++ -O2 -mavx2 repr.cpp; ./a.out 1540 > g++ -O3 -mavx2 repr.cpp; ./a.out 1204 GCC version: > g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/export/users/vlivinsk/gcc-trunk/bin/libexec/gcc/x86_64-pc-linux-gnu/6.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /export/users/vlivinsk/gcc-trunk/gcc/configure --with-arch=corei7 --with-cpu=corei7 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld --enable-cloog-backend=isl --with-fpmath=sse --enable-checking=release --enable-languages=c,c++,lto --with-gmp=/export/users/vlivinsk/gcc-trunk/gmp-6.1.0/bin --with-mpfr=/export/users/vlivinsk/gcc-trunk/mpfr-3.1.3/bin --with-mpc=/export/users/vlivinsk/gcc-trunk/mpc-1.0.3/bin --prefix=/export/users/vlivinsk/gcc-trunk/bin Thread model: posix gcc version 6.0.0 20160404 (experimental) (Revision=234705) Test: #include <cstdlib> int a[520]; short b[482]; short c[480]; int d[963]; short e[320]; int main() { for (int i = 0; i < 520; ++i) a[i] = -636544305; for (int i = 0; i < 386; ++i) b[i] = -31804; for (long i = 1; i <= 112; ++i) { c[i] = b[i] >> ((a[i] & 1587842570) - 1510214139); if (a[i]) d[i] = i; e[i] = 7 << ((2312631697 - b[i]) - 2312663500); } unsigned long long g = 0; for (int i = 0; i < 111; ++i) { g = e[i] + g; } if (g != 1540) abort(); }
Created attachment 38188 [details] Correct reproducer. I've mixed up reproducers.
Looking into this.
Started with r233068. Slightly cleaned up testcase: int a[113], d[113]; short b[113], c[113], e[113]; int main () { int i; long j; unsigned long long g = 0; for (i = 0; i < 113; ++i) { a[i] = -636544305; b[i] = -31804; } for (j = 1; j <= 112; ++j) { c[j] = b[j] >> ((a[j] & 1587842570) - 1510214139); if (a[j]) d[j] = j; e[j] = 7 << ((2312631697 - b[j]) - 2312663500); } for (i = 0; i < 111; ++i) g = e[i] + g; if (g != 1540) __builtin_abort (); return 0; }
This goes wrong during the REE pass.
Created attachment 38190 [details] gcc6-pr70542.patch Untested fix.
Created attachment 38191 [details] gcc6-pr70542.patch Perhaps better fix. Looking at PR64286, the comment is right, if we change somehow the definition for VECTOR_MODE_P, we need to make sure all uses are modified, or none. But there is tons of reasons why some of them could fail, e.g. for the copy_needed case there is lots of tests that can fail, but even for the !copy_needed case it could fail to get recognized etc., and we are unable to handle all the uses as a single transaction.
Author: jakub Date: Tue Apr 5 17:05:23 2016 New Revision: 234756 URL: https://gcc.gnu.org/viewcvs?rev=234756&root=gcc&view=rev Log: PR rtl-optimization/70542 * ree.c (add_removable_extension): For VECTOR_MODE_P punt if there are any uses other than insn or debug insns. * gcc.dg/torture/pr70542.c: New test. * gcc.target/i386/avx2-pr70542.c: New test. Added: trunk/gcc/testsuite/gcc.dg/torture/pr70542.c trunk/gcc/testsuite/gcc.target/i386/avx2-pr70542.c Modified: trunk/gcc/ChangeLog trunk/gcc/ree.c trunk/gcc/testsuite/ChangeLog
Fixed.