Bug 77478 - Incorrect code generated with -O3, m32, -msse2 and -ffast-math
Summary: Incorrect code generated with -O3, m32, -msse2 and -ffast-math
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 5.4.0
: P3 normal
Target Milestone: 5.5
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2016-09-04 23:32 UTC by Nick Appleton
Modified: 2016-09-27 12:45 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 5.4.1, 6.1.0
Known to fail: 5.4.0
Last reconfirmed: 2016-09-22 00:00:00


Attachments
The output of -save-temps (421 bytes, text/plain)
2016-09-04 23:32 UTC, Nick Appleton
Details
debug patch (570 bytes, patch)
2016-09-06 07:51 UTC, Richard Biener
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Appleton 2016-09-04 23:32:56 UTC
Created attachment 39555 [details]
The output of -save-temps

Apologies if I've selected the wrong component - I suspect this issue is something wrong in one of the optimization phases, but was not sure what "component" to select.

The attached program will crash with a segmentation fault when compiled with the specified arguments. The generated code appears to be generating aligned vector load instructions on a not properly aligned address. I've tried to reduce the program and compiler arguments as much as I can into something which still triggers the behavior.

* the exact version of GCC;
* the system type;
* the options given when GCC was configured/built;

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) 

(This GCC was obtained using apt)

* the complete command line that triggers the bug;

gcc -msse2 -O3 -ffast-math -m32 test.c

* the compiler output (error messages, warnings, etc.); and

The compiler produces no warnings with -Wextra. When I tried to use the undefined behavior sanitizer (-fsanitize=undefined), the crash vanishes.

* the preprocessed file (*.i*) that triggers the bug, generated by adding -save-temps to the complete compilation command, or, in the case of a bug report for the GNAT front end, a complete set of source files (see below).

This file is attached and is identical to the .c file which generated it (minus the hash-prefixed lines at the top).
Comment 1 Jakub Jelinek 2016-09-05 12:37:47 UTC
Can't reproduce with
gcc version 5.3.1 20151207 (Red Hat 5.3.1-2)
or
gcc (GCC) 5.4.1 20160721
Comment 2 Alexander Monakov 2016-09-05 13:38:57 UTC
I can reproduce this on a distro that implicitly enables -fstack-protector-strong; perhaps this Ubuntu build has similar behavior.

Nick, to double check, can you confirm that adding option '-fno-stack-protector' fixes this on your end?

I think there's a vectorizer bug here. Unless I've missed something, the vectorizer performs peeling for alignment only for the first loop. The second loop is not peeled for alignment, and yet uses aligned loads. With -fno-stack-protector, the 'many_things' array is (by luck) aligned to 16-byte boundary so it works, with -fstack-protector-strong that array is no longer aligned, exposing the issue.
Comment 3 Alexander Monakov 2016-09-05 16:23:43 UTC
On further investigation, lack of peeling might be intentional, the vectorizer could be deliberately using unaligned access. Previously I missed that the body of the vectorized loop is completely unrolled. If I use -fdisable-tree-cunroll, I get a vectorized loop that is properly using an unaligned load.

(is there a way to get alignment info in tree dumps?)
Comment 4 Alexander Monakov 2016-09-05 19:01:11 UTC
Slightly reduced testcase that demonstrates the issue regardless of stack-protector; -O3 -ffast-math is enough on x86-64 (plus -msse2 on i386).

Oddly, the #if0 block makes a difference.

static const float A[10] = {1};
#if 0
static
__attribute__((noinline,noclone))
#endif
float
foo(float *f, int n)
{
  int i, j;
  float a = 0, b = 0;
  for (i = n/2; i < n; i++)
    a += f[i]*.1f;
  for (i = n/2, j = 0; i < n; i++, j++)
    b += f[i]*A[j]+a*A[j];
  return b;
}

int main()
{
  float a[21] = {0};
  return foo(a+1, 20) + foo(a, 20);
}
Comment 5 Richard Biener 2016-09-06 07:51:05 UTC
Created attachment 39572 [details]
debug patch

I suspect this is a duplicate of PR66598.  I have the attached in my local tree which dumps alignment info with -alias.
Comment 6 Alexander Monakov 2016-09-20 18:43:53 UTC
Thanks, seeing alignment info in dumps helps (I think you meant -vops rather than -alias?).

This doesn't seem to reproduce on trunk. On gcc-5 branch, I see alignment increasing in dom2 pass.

Specifically, the 147t.slsr dump prior to dom2 has:

  <bb 23>:
...
  vectp.10_4 = vectp.22_88;
...
  # rhs access alignment 32+0
  vect__22.11_163 = MEM[(float *)vectp.10_4];

and then 149.dom2 has:

  <bb 23>:
...
  vectp.10_4 = vectp.22_88;
...
  # rhs access alignment 128+0
  vect__22.11_163 = MEM[(float *)vectp.22_88];
Comment 7 Alexander Monakov 2016-09-21 14:56:09 UTC
Richard, I don't believe this is a dup. According to my git-bisect, this was fixed or made latent during gcc-6 development by your patch:

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00120.html
[PATCH] Consolidate alignment folding
(this is r225310 on trunk)

How to go forward from here?

(about the alignment dumping patchlet, I think it would be nice to have this functionality on trunk, can I help with that? is there something I'm missing about this patch that makes it unsuitable for trunk as-is?)
Comment 8 Richard Biener 2016-09-22 08:08:13 UTC
(In reply to Alexander Monakov from comment #7)
> Richard, I don't believe this is a dup. According to my git-bisect, this was
> fixed or made latent during gcc-6 development by your patch:
> 
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00120.html
> [PATCH] Consolidate alignment folding
> (this is r225310 on trunk)

Specifically the tree-vect-loop-manip.c change.

Without it we have after vectorization

  <bb 3>:
  _86 = (unsigned int) n_7(D);
  _87 = (unsigned int) i_8;
  niters.21_85 = _86 - _87;
  _89 = (long unsigned int) i_8;
  _90 = _89 * 4;
  # PT = nonlocal
  # ALIGN = 16, MISALIGN = 0
  vectp.22_88 = f_11(D) + _90;
  _92 = (unsigned long) vectp.22_88;
  _93 = _92 & 15;
  _94 = _93 >> 2;
  _95 = -_94;
  _96 = (unsigned int) _95;
  _97 = _96 & 3;
  prolog_loop_niters.23_91 = MIN_EXPR <niters.21_85, _97>;
  if (niters.21_85 <= 6)
    goto <bb 4>;
  else
    goto <bb 5>;

...

but vectp.22_88 is not aligned to 16 bytes.

> How to go forward from here?

Backporting

Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c  (revision 240342)
+++ gcc/tree-vect-loop-manip.c  (working copy)
@@ -1886,7 +1886,7 @@ vect_gen_niters_for_prolog_loop (loop_ve
       gimple_seq new_stmts = NULL;
       bool negative = tree_int_cst_compare (DR_STEP (dr), size_zero_node) < 0;
       tree offset = negative
-         ? size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1) : NULL_TREE;
+         ? size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1) : size_zero_node;
       tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt,
                                                &new_stmts, offset, loop);
       tree type = unsigned_type_for (TREE_TYPE (start_addr));

looks like the way to go forward.

> (about the alignment dumping patchlet, I think it would be nice to have this
> functionality on trunk, can I help with that? is there something I'm missing
> about this patch that makes it unsuitable for trunk as-is?)

Only that it makes -vops even more noisy.
Comment 9 Richard Biener 2016-09-27 11:28:25 UTC
Author: rguenth
Date: Tue Sep 27 11:27:54 2016
New Revision: 240530

URL: https://gcc.gnu.org/viewcvs?rev=240530&root=gcc&view=rev
Log:
2016-09-27  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/77478
	* gcc.dg/torture/pr77478.c: New testcase.

Added:
    trunk/gcc/testsuite/gcc.dg/torture/pr77478.c
Modified:
    trunk/gcc/testsuite/ChangeLog
Comment 10 Richard Biener 2016-09-27 11:30:00 UTC
Author: rguenth
Date: Tue Sep 27 11:29:28 2016
New Revision: 240531

URL: https://gcc.gnu.org/viewcvs?rev=240531&root=gcc&view=rev
Log:
2016-09-27  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/77478
	* gcc.dg/torture/pr77478.c: New testcase.

Added:
    branches/gcc-6-branch/gcc/testsuite/gcc.dg/torture/pr77478.c
Modified:
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
Comment 11 Richard Biener 2016-09-27 12:45:15 UTC
Author: rguenth
Date: Tue Sep 27 12:44:42 2016
New Revision: 240532

URL: https://gcc.gnu.org/viewcvs?rev=240532&root=gcc&view=rev
Log:
2016-09-27  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/77478
	* tree-vect-loop-manip.c (vect_gen_niters_for_prolog_loop):
	Fix alignment of SSA var used before the alignment prologue.

	* gcc.dg/torture/pr77478.c: New testcase.

Added:
    branches/gcc-5-branch/gcc/testsuite/gcc.dg/torture/pr77478.c
Modified:
    branches/gcc-5-branch/gcc/ChangeLog
    branches/gcc-5-branch/gcc/testsuite/ChangeLog
    branches/gcc-5-branch/gcc/tree-vect-loop-manip.c
Comment 12 Richard Biener 2016-09-27 12:45:28 UTC
Fixed.