Created attachment 49142 [details] minimalish preprocessed sources that compiles to differently behaved programs based on optimization level using gcc 10 Hi. Attached reproduces an issue where an __m128i value appears backwards on gcc versions 10 and later with -O3. I don't know what's going on but the compiler output does a different thing than compared to that of different compilers, older version of gcc, or gcc 10 with lower optimization levels. The output I expect is "6 4 2 0", with gcc 10 -O3 I get "0 2 4 6". There are a couple modifications to the source that make the issue go away, like adding... > if (dude_[0] == 1234) dude_[0]--; ...after the for loop (but not before). Or using a loop conditional of `i < 2` instead of `i < 3`. And again this happens with -O3 but not with -O2. clang 10 and apparently gcc 5 and 8 both give the expected output. But this is not the case with gcc 10 from my system as well as when compiled from a recent checkout of gcc (82030d51017323c5706d58d8c8626324ece007e4) My system gcc: gcc version 10.2.1 20200723 (Red Hat 10.2.1-1) (GCC) Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux A gcc from a recent git checkout: Target: x86_64-pc-linux-gnu Configured with: /home/sqwishy/src/gcc/configure --enable-languages=c --disable-multilib gcc version 11.0.0 20200827 (experimental) (GCC) This is the command and output of compiling the attached file (min.i) with the version of gcc I built from a recent checkout. > ./usr/local/bin/gcc -v -O3 -Wall -Wextra -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations min.i Using built-in specs. COLLECT_GCC=./usr/local/bin/gcc COLLECT_LTO_WRAPPER=/opt/gcc-inst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /home/sqwishy/src/gcc/configure --enable-languages=c --disable-multilib Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.0.0 20200827 (experimental) (GCC) COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-fno-aggressive-loop-optimizations' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a-' /opt/gcc-inst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0/cc1 -fpreprocessed min.i -quiet -dumpdir a- -dumpbase min.i -dumpbase-ext .i -mtune=generic -march=x86-64 -O3 -Wall -Wextra -version -fno-strict-aliasing -fwrapv -fno-aggressive-loop-optimizations -o /tmp/ccUHO9zb.s GNU C17 (GCC) version 11.0.0 20200827 (experimental) (x86_64-pc-linux-gnu) compiled by GNU C version 11.0.0 20200827 (experimental), GMP version 6.1.2, MPFR version 4.0.2-p9, MPC version 1.1.0, isl version none GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C17 (GCC) version 11.0.0 20200827 (experimental) (x86_64-pc-linux-gnu) compiled by GNU C version 11.0.0 20200827 (experimental), GMP version 6.1.2, MPFR version 4.0.2-p9, MPC version 1.1.0, isl version none GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: 04849eadbd492e04db2b98f5ed9cc34b COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-fno-aggressive-loop-optimizations' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a-' as -v --64 -o /tmp/ccMwZ4Yb.o /tmp/ccUHO9zb.s GNU assembler version 2.34 (x86_64-redhat-linux) using BFD version version 2.34-4.fc32 COMPILER_PATH=/opt/gcc-inst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0/:/opt/gcc-inst/usr/local/bin/../libexec/gcc/ LIBRARY_PATH=/opt/gcc-inst/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/11.0.0/:/opt/gcc-inst/usr/local/bin/../lib/gcc/:/opt/gcc-inst/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/opt/gcc-inst/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-fno-aggressive-loop-optimizations' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a.' /opt/gcc-inst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0/collect2 -plugin /opt/gcc-inst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0/liblto_plugin.so -plugin-opt=/opt/gcc-inst/usr/local/bin/../libexec/gcc/x86_64-pc-linux-gnu/11.0.0/lto-wrapper -plugin-opt=-fresolution=/tmp/ccF6LwQe.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /lib/../lib64/crt1.o /lib/../lib64/crti.o /opt/gcc-inst/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/11.0.0/crtbegin.o -L/opt/gcc-inst/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/11.0.0 -L/opt/gcc-inst/usr/local/bin/../lib/gcc -L/opt/gcc-inst/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/opt/gcc-inst/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/11.0.0/../../.. /tmp/ccMwZ4Yb.o -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /opt/gcc-inst/usr/local/bin/../lib/gcc/x86_64-pc-linux-gnu/11.0.0/crtend.o /lib/../lib64/crtn.o COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-fno-strict-aliasing' '-fwrapv' '-fno-aggressive-loop-optimizations' '-mtune=generic' '-march=x86-64' '-dumpdir' 'a.'
Just in case, does adding -fno-strict-aliasing help?
Oddly enough with > gcc-10 --version gcc-10 (SUSE Linux) 10.2.1 20200805 [revision dda1e9d08434def88ed86557d08b23251332c5aa] Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. I get your expected "6 4 2 0" at all optimization levels.
(In reply to Andrew Pinski from comment #1) > Just in case, does adding -fno-strict-aliasing help? That is already among the options. But I can't reproduce either, neither with current trunk, nor with different 10.x snapshots.
Created attachment 49150 [details] actually minimalish preprocessed sources that compiles to differently behaved programs based on optimization level using gcc 10
I was testing modifications to the testcase that caused it to compile correctly and forgot to save my editor's buffer after undoing those changes. I have attached the correct file. Sorry for being an idiot.
Ok, I can reproduce that one, started with r10-4336-g818b3293f4545d899148810f4f7d676b81e989dd The first difference between the commit right before that one and r10-4336 is in slp1 pass: _5 = dude_[3]; _6 = (int) _5; _7 = dude_[2]; _8 = (int) _7; _9 = dude_[1]; _10 = (int) _9; + vect__11.11_30 = MEM <vector(4) unsigned int> [(unsigned int *)&dude_]; _11 = dude_[0]; + vect__12.12_40 = VIEW_CONVERT_EXPR<vector(4) int>(vect__11.11_30); _12 = (int) _11; - _22 = {_6, _8, _10, _12}; + _22 = vect__12.12_40; _28 = VIEW_CONVERT_EXPR<__m128i>(_22); *dude_23(D) = _28; dude_ ={v} {CLOBBER}; and that changes the element ordering in the _22 vector, previously it was dude_[3], dude_[2], dude_[1], dude_[0], newly it is read from the memory which is the exact opposite order. The _22 = {_6, _8, _10, _12} is what is in the IL before slp1, so clearly this is a SLP introduced breakage.
Reduced testcase for -O3: typedef int V __attribute__((__vector_size__(16))); __attribute__((__noipa__)) void foo (unsigned int x, V *y) { unsigned int a[4] = { x + 0, x + 2, x + 4, x + 6 }; for (unsigned int i = 0; i < 3; ++i) if (a[i] == 1234) a[i]--; *y = (V) { a[3], a[2], a[1], a[0] }; } int main () { V b; foo (0, &b); if (b[0] != 6 || b[1] != 4 || b[2] != 2 || b[3] != 0) __builtin_abort (); return 0; }
I'm working on this. I believe this may have been introduced by my earlier SLP vector constructor patch.(commit 10d1592) What I believe to be the relevant section: + else if (constructor) + { + tree rhs = gimple_assign_rhs1 (stmt_info->stmt); + tree val; + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val) + { + if (TREE_CODE (val) == SSA_NAME) + { + gimple* def = SSA_NAME_DEF_STMT (val); + stmt_vec_info def_info = vinfo->lookup_stmt (def); + /* Value is defined in another basic block. */ + if (!def_info) + return false; + scalar_stmts.safe_push (def_info); + } + else + return false; + } + } I'm investigating, but I suspect pushing to a stack which is then popped from later has created a reversal of element order.
This is fixed on trunk by 97b798d8, unfortunately I made a typo in the commit message.
r11-3559-g97b798d80baf945ea28236eef3fa69f36626b579 (so that there is link to that commit).
GCC 10 branch isn't fixed.
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:373b99dc40949efa697326f378e5022a02e0328b commit r11-3565-g373b99dc40949efa697326f378e5022a02e0328b Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Sep 30 08:13:21 2020 -0700 Add a testcase for PR target/96827 Add a testcase for PR target/96827 which was fixed by r11-3559: commit 97b798d80baf945ea28236eef3fa69f36626b579 Author: Joel Hutton <joel.hutton@arm.com> Date: Wed Sep 30 15:08:13 2020 +0100 [SLP][VECT] Add check to fix 96837 PR target/96827 * gcc.target/i386/pr96827.c: New test.
The releases/gcc-10 branch has been updated by Joel Hutton <joelh@gcc.gnu.org>: https://gcc.gnu.org/g:d0ceb8e276e282a2c9e08eb295ca5c9678d54c63 commit r10-8833-gd0ceb8e276e282a2c9e08eb295ca5c9678d54c63 Author: Joel Hutton <joel.hutton@arm.com> Date: Wed Sep 30 16:20:55 2020 +0100 [SLP][VECT] Add check to fix 96827 The following patch adds a simple check to prevent slp stmts from vector constructors being rearranged. vect_attempt_slp_rearrange_stmts tries to rearrange to avoid a load permutation. This fixes PR target/96827 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96827 gcc/ChangeLog: 2020-09-29 Joel Hutton <joel.hutton@arm.com> PR target/96827 * tree-vect-slp.c (vect_analyze_slp): Do not call vect_attempt_slp_rearrange_stmts for vector constructors. gcc/testsuite/ChangeLog: 2020-09-29 Joel Hutton <joel.hutton@arm.com> PR target/96827 * gcc.dg/vect/bb-slp-49.c: New test. (cherry picked from commit 97b798d80baf945ea28236eef3fa69f36626b579)
backported to GCC 10.
The releases/gcc-10 branch has been updated by H.J. Lu <hjl@gcc.gnu.org>: https://gcc.gnu.org/g:1c959dedbde7cc64747cf13fa76b4e8654be26ea commit r10-8836-g1c959dedbde7cc64747cf13fa76b4e8654be26ea Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Sep 30 08:13:21 2020 -0700 Add a testcase for PR target/96827 Add a testcase for PR target/96827 which was fixed by r11-3559: commit 97b798d80baf945ea28236eef3fa69f36626b579 Author: Joel Hutton <joel.hutton@arm.com> Date: Wed Sep 30 15:08:13 2020 +0100 [SLP][VECT] Add check to fix 96837 PR target/96827 * gcc.target/i386/pr96827.c: New test. (cherry picked from commit 373b99dc40949efa697326f378e5022a02e0328b)
*** Bug 97482 has been marked as a duplicate of this bug. ***