Created attachment 39555 [details] The output of -save-temps Apologies if I've selected the wrong component - I suspect this issue is something wrong in one of the optimization phases, but was not sure what "component" to select. The attached program will crash with a segmentation fault when compiled with the specified arguments. The generated code appears to be generating aligned vector load instructions on a not properly aligned address. I've tried to reduce the program and compiler arguments as much as I can into something which still triggers the behavior. * the exact version of GCC; * the system type; * the options given when GCC was configured/built; Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) (This GCC was obtained using apt) * the complete command line that triggers the bug; gcc -msse2 -O3 -ffast-math -m32 test.c * the compiler output (error messages, warnings, etc.); and The compiler produces no warnings with -Wextra. When I tried to use the undefined behavior sanitizer (-fsanitize=undefined), the crash vanishes. * the preprocessed file (*.i*) that triggers the bug, generated by adding -save-temps to the complete compilation command, or, in the case of a bug report for the GNAT front end, a complete set of source files (see below). This file is attached and is identical to the .c file which generated it (minus the hash-prefixed lines at the top).
Can't reproduce with gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) or gcc (GCC) 5.4.1 20160721
I can reproduce this on a distro that implicitly enables -fstack-protector-strong; perhaps this Ubuntu build has similar behavior. Nick, to double check, can you confirm that adding option '-fno-stack-protector' fixes this on your end? I think there's a vectorizer bug here. Unless I've missed something, the vectorizer performs peeling for alignment only for the first loop. The second loop is not peeled for alignment, and yet uses aligned loads. With -fno-stack-protector, the 'many_things' array is (by luck) aligned to 16-byte boundary so it works, with -fstack-protector-strong that array is no longer aligned, exposing the issue.
On further investigation, lack of peeling might be intentional, the vectorizer could be deliberately using unaligned access. Previously I missed that the body of the vectorized loop is completely unrolled. If I use -fdisable-tree-cunroll, I get a vectorized loop that is properly using an unaligned load. (is there a way to get alignment info in tree dumps?)
Slightly reduced testcase that demonstrates the issue regardless of stack-protector; -O3 -ffast-math is enough on x86-64 (plus -msse2 on i386). Oddly, the #if0 block makes a difference. static const float A[10] = {1}; #if 0 static __attribute__((noinline,noclone)) #endif float foo(float *f, int n) { int i, j; float a = 0, b = 0; for (i = n/2; i < n; i++) a += f[i]*.1f; for (i = n/2, j = 0; i < n; i++, j++) b += f[i]*A[j]+a*A[j]; return b; } int main() { float a[21] = {0}; return foo(a+1, 20) + foo(a, 20); }
Created attachment 39572 [details] debug patch I suspect this is a duplicate of PR66598. I have the attached in my local tree which dumps alignment info with -alias.
Thanks, seeing alignment info in dumps helps (I think you meant -vops rather than -alias?). This doesn't seem to reproduce on trunk. On gcc-5 branch, I see alignment increasing in dom2 pass. Specifically, the 147t.slsr dump prior to dom2 has: <bb 23>: ... vectp.10_4 = vectp.22_88; ... # rhs access alignment 32+0 vect__22.11_163 = MEM[(float *)vectp.10_4]; and then 149.dom2 has: <bb 23>: ... vectp.10_4 = vectp.22_88; ... # rhs access alignment 128+0 vect__22.11_163 = MEM[(float *)vectp.22_88];
Richard, I don't believe this is a dup. According to my git-bisect, this was fixed or made latent during gcc-6 development by your patch: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00120.html [PATCH] Consolidate alignment folding (this is r225310 on trunk) How to go forward from here? (about the alignment dumping patchlet, I think it would be nice to have this functionality on trunk, can I help with that? is there something I'm missing about this patch that makes it unsuitable for trunk as-is?)
(In reply to Alexander Monakov from comment #7) > Richard, I don't believe this is a dup. According to my git-bisect, this was > fixed or made latent during gcc-6 development by your patch: > > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00120.html > [PATCH] Consolidate alignment folding > (this is r225310 on trunk) Specifically the tree-vect-loop-manip.c change. Without it we have after vectorization <bb 3>: _86 = (unsigned int) n_7(D); _87 = (unsigned int) i_8; niters.21_85 = _86 - _87; _89 = (long unsigned int) i_8; _90 = _89 * 4; # PT = nonlocal # ALIGN = 16, MISALIGN = 0 vectp.22_88 = f_11(D) + _90; _92 = (unsigned long) vectp.22_88; _93 = _92 & 15; _94 = _93 >> 2; _95 = -_94; _96 = (unsigned int) _95; _97 = _96 & 3; prolog_loop_niters.23_91 = MIN_EXPR <niters.21_85, _97>; if (niters.21_85 <= 6) goto <bb 4>; else goto <bb 5>; ... but vectp.22_88 is not aligned to 16 bytes. > How to go forward from here? Backporting Index: gcc/tree-vect-loop-manip.c =================================================================== --- gcc/tree-vect-loop-manip.c (revision 240342) +++ gcc/tree-vect-loop-manip.c (working copy) @@ -1886,7 +1886,7 @@ vect_gen_niters_for_prolog_loop (loop_ve gimple_seq new_stmts = NULL; bool negative = tree_int_cst_compare (DR_STEP (dr), size_zero_node) < 0; tree offset = negative - ? size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1) : NULL_TREE; + ? size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1) : size_zero_node; tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt, &new_stmts, offset, loop); tree type = unsigned_type_for (TREE_TYPE (start_addr)); looks like the way to go forward. > (about the alignment dumping patchlet, I think it would be nice to have this > functionality on trunk, can I help with that? is there something I'm missing > about this patch that makes it unsuitable for trunk as-is?) Only that it makes -vops even more noisy.
Author: rguenth Date: Tue Sep 27 11:27:54 2016 New Revision: 240530 URL: https://gcc.gnu.org/viewcvs?rev=240530&root=gcc&view=rev Log: 2016-09-27 Richard Biener <rguenther@suse.de> PR tree-optimization/77478 * gcc.dg/torture/pr77478.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/torture/pr77478.c Modified: trunk/gcc/testsuite/ChangeLog
Author: rguenth Date: Tue Sep 27 11:29:28 2016 New Revision: 240531 URL: https://gcc.gnu.org/viewcvs?rev=240531&root=gcc&view=rev Log: 2016-09-27 Richard Biener <rguenther@suse.de> PR tree-optimization/77478 * gcc.dg/torture/pr77478.c: New testcase. Added: branches/gcc-6-branch/gcc/testsuite/gcc.dg/torture/pr77478.c Modified: branches/gcc-6-branch/gcc/testsuite/ChangeLog
Author: rguenth Date: Tue Sep 27 12:44:42 2016 New Revision: 240532 URL: https://gcc.gnu.org/viewcvs?rev=240532&root=gcc&view=rev Log: 2016-09-27 Richard Biener <rguenther@suse.de> PR tree-optimization/77478 * tree-vect-loop-manip.c (vect_gen_niters_for_prolog_loop): Fix alignment of SSA var used before the alignment prologue. * gcc.dg/torture/pr77478.c: New testcase. Added: branches/gcc-5-branch/gcc/testsuite/gcc.dg/torture/pr77478.c Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/testsuite/ChangeLog branches/gcc-5-branch/gcc/tree-vect-loop-manip.c
Fixed.