Bug 65450 - [4.9 Regression]: Unaligned access with -O3 -mtune=k8
Summary: [4.9 Regression]: Unaligned access with -O3 -mtune=k8
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 5.0
: P2 normal
Target Milestone: 4.9.3
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2015-03-17 09:36 UTC by Uroš Bizjak
Modified: 2015-06-03 21:43 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work: 4.8.0, 5.0
Known to fail: 4.9.2
Last reconfirmed: 2015-03-17 00:00:00


Attachments
Testcase from Polyhedron testsuite (3.04 KB, text/plain)
2015-03-17 09:37 UTC, Uroš Bizjak
Details
gcc5-pr65450.patch (515 bytes, patch)
2015-03-17 21:29 UTC, Jakub Jelinek
Details | Diff
gcc5-pr65450.patch (948 bytes, patch)
2015-03-18 09:00 UTC, Jakub Jelinek
Details | Diff
gcc5-pr65450.patch (1.39 KB, patch)
2015-03-18 10:49 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2015-03-17 09:36:59 UTC
The compiler generates unaligned access for Polyhedron channel.f90 test when compiled with -O2 -mtune=k8:

/home/uros/gcc-build/gcc/gfortran -B/home/uros/gcc-build/gcc -B/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgfortran/  -B/home/uros/gcc-build/x86_64-unknown-linux-gnu/libquadmath/.libs -L/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgfortran/.libs -L-L/home/uros/gcc-build/x86_64-unknown-linux-gnu/libquadmath/.libs -O2 -ffast-math -mtune=k8 channel.f90

LD_LIBRARY_PATH=/home/uros/gcc-build/gcc:/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgfortran/.libs:/home/uros/gcc-build/x86_64-unknown-linux-gnu/libquadmath/.libs ./a.out

...
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x2AC9058B3C27
#1  0x2AC9058B2E20
#2  0x31CDC3002F
#3  0x402AFB in ddx at channel.f90:247
Segmentation fault

(gdb) r
...
Program received signal SIGSEGV, Segmentation fault.
0x0000000000402afb in ddx (array=..., __result=...) at channel.f90:247
247         ddx(2:I-1,1:J) = array(3:I,1:J)-array(1:I-2,1:J)    ! interior points

(gdb) bt
#0  0x0000000000402afb in ddx (array=..., __result=...) at channel.f90:247
#1  sw () at channel.f90:148
#2  0x0000000000404cfd in main (argc=<optimized out>, argv=<optimized out>) at channel.f90:234
#3  0x00000031cdc1d9f4 in __libc_start_main () from /lib64/libc.so.6
#4  0x0000000000400909 in _start ()

(gdb) disass
   ...
   0x0000000000402acc <+7644>:  movaps %xmm4,-0x20(%rdx)
   0x0000000000402ad0 <+7648>:  movlpd -0x10(%rax),%xmm0
   0x0000000000402ad5 <+7653>:  movhpd -0x8(%rax),%xmm0
   0x0000000000402ada <+7658>:  movapd %xmm0,%xmm4
   0x0000000000402ade <+7662>:  subpd  %xmm3,%xmm4
   0x0000000000402ae2 <+7666>:  movaps %xmm4,-0x10(%rdx)
   0x0000000000402ae6 <+7670>:  cmp    $0xf4,%rdi
   0x0000000000402aed <+7677>:  jne    0x402a71 <sw+7553>
   0x0000000000402aef <+7679>:  lea    0x20(%rcx),%rdi
   0x0000000000402af3 <+7683>:  add    $0x20,%rax
   0x0000000000402af7 <+7687>:  add    $0x20,%rdx
=> 0x0000000000402afb <+7691>:  movapd -0x20(%rax),%xmm3
   0x0000000000402b00 <+7696>:  mov    $0xf6,%r11d
   0x0000000000402b06 <+7702>:  xor    %ecx,%ecx
   0x0000000000402b08 <+7704>:  movapd %xmm3,%xmm4
   0x0000000000402b0c <+7708>:  subpd  %xmm0,%xmm4
   ...

(gdb) p/x $rax
$3 = 0x8c5a38

The failure can also be seen on SUSE gcc tester [1].

[1] http://gcc.opensuse.org/c++bench/polyhedron/polyhedron-performance-latest
Comment 1 Uroš Bizjak 2015-03-17 09:37:58 UTC
Created attachment 35042 [details]
Testcase from Polyhedron testsuite
Comment 2 Uroš Bizjak 2015-03-17 10:00:43 UTC
The problematic instruction (insn 1717) is generated from:

;; vect__1095.501_3524 = MEM[base: vectp.499_3571, offset: 0B];

(insn 1717 1716 0 (set (reg:V2DF 1511 [ vect__1095.501 ])
        (mem:V2DF (reg/f:DI 638 [ vectp.499 ]) [7 MEM[base: vectp.499_3571, offset: 0B]+0 S16 A256])) channel.f90:247 -1
     (nil))
Comment 3 Dominique d'Humieres 2015-03-17 10:14:48 UTC
Confirmed between r220156 (2015-01-27, OK) and r220302 (2015-01-31, segfault). I am not sure this is a fortran problem (no segfault if the code is compiled with '-O3 -fno-tree-vectorize -mtune=k8').
Comment 4 Uroš Bizjak 2015-03-17 10:22:02 UTC
(In reply to Uroš Bizjak from comment #0)
> The compiler generates unaligned access for Polyhedron channel.f90 test when
> compiled with -O2 -mtune=k8:

Whoops, this should read "-O3 -mtune=k8".
> 
> /home/uros/gcc-build/gcc/gfortran -B/home/uros/gcc-build/gcc
> -B/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgfortran/ 
> -B/home/uros/gcc-build/x86_64-unknown-linux-gnu/libquadmath/.libs
> -L/home/uros/gcc-build/x86_64-unknown-linux-gnu/libgfortran/.libs
> -L-L/home/uros/gcc-build/x86_64-unknown-linux-gnu/libquadmath/.libs -O2
> -ffast-math -mtune=k8 channel.f90

And here. Correct flags are "-O3 -mtune=k8".  Everything reported is compiled with these two flags only.
Comment 5 Dominique d'Humieres 2015-03-17 10:25:37 UTC
Confirmed: see comment 3.
Comment 6 Jakub Jelinek 2015-03-17 10:39:20 UTC
Most likely r220244, but have to verify that.
Comment 7 Markus Trippelsdorf 2015-03-17 12:41:56 UTC
Also crashes with -mtune=amdfam10. But in this case it even
crashes when compiled with 4.9.2.
Comment 8 Markus Trippelsdorf 2015-03-17 12:44:37 UTC
Works fine with -fwrapv...
Comment 9 Dominique d'Humieres 2015-03-17 12:57:10 UTC
> Also crashes with -mtune=amdfam10. But in this case it even
> crashes when compiled with 4.9.2.

Revision r204000 (2013-10-24) is OK, r204945 (2013-11-18) is not.

> Works fine with -fwrapv...

Confirmed.
Comment 10 Jakub Jelinek 2015-03-17 13:24:49 UTC
(In reply to Dominique d'Humieres from comment #9)
> > Also crashes with -mtune=amdfam10. But in this case it even
> > crashes when compiled with 4.9.2.
> 
> Revision r204000 (2013-10-24) is OK, r204945 (2013-11-18) is not.
> 
> > Works fine with -fwrapv...
> 
> Confirmed.

r204257 in particular.
Comment 11 Uroš Bizjak 2015-03-17 14:15:38 UTC
(In reply to Dominique d'Humieres from comment #9)
> > Also crashes with -mtune=amdfam10. But in this case it even
> > crashes when compiled with 4.9.2.
> 
> Revision r204000 (2013-10-24) is OK, r204945 (2013-11-18) is not.

Probably the cause of failure for "channel" testcase at SUSE's amdfam10 tester [1], but nobody noticed.

[1] http://gcc.opensuse.org/c++bench-frescobaldi/polyhedron/polyhedron-summary.txt-2-0.html
Comment 12 Jakub Jelinek 2015-03-17 18:06:02 UTC
So, I believe it is incorrect ALIGN info as can be seen in the -fdump-tree-all-alias dumps.
Seems with current trunk on x86_64-linux and
-g -quiet -mtune=amdfam10 -O3 pr65450.f90
the problematic memory load which should have been movupd and is movapd instead is using _3571:
  # ALIGN = 32, MISALIGN = 0
  vectp.499_3571 = vectp.499_1481 + 64;
which has been created by aprefetch pass from:
  # ALIGN = 32, MISALIGN = 0
  vectp.499_135 = vectp.499_1480 + 16;
which has been created by pcom pass from:
  # ALIGN = 32, MISALIGN = 0
  vectp.499_1480 = vectp.499_1481 + 16;
which has been created during vectorization by vect_create_data_ref_ptr -> create_iv -> make_ssa_name for dr:
#(Data Ref: 
#  bb: 43 
#  stmt: _1095 = MEM[(real(kind=8)[0:D.3649] *)_558][_1094];
#  ref: MEM[(real(kind=8)[0:D.3649] *)_558][_1094];
#  base_object: MEM[(real(kind=8)[0:D.3649] *)_558];
#  Access function 0: {_1086 + 3, +, 1}_41
#)
- _1480 is indx_after_incr.
The base_object is indeed 32 byte aligned:
  # RANGE [-64424509440000, 60000] NONZERO 18446744073709551600
  _557 = _556 * 30000;
  # PT = { D.3692 } (nonlocal)
  # ALIGN = 32, MISALIGN = 0
  _558 = &u[_557];
- u is a common aligned to 32 bytes:
static real(kind=8) u[90000];
and 30000 is divisible by 16, and it is ARRAY_REF, so the offset from &u[0] is
a multiple of 128 bytes.  But that doesn't tell anything about what values _1094 can have.
I see vect_create_addr_base_for_vector_ref already always overrides the align/misalign info after duplicating DR_PTR_INFO, and so does bump_vector_ptr, but vect_create_data_ref_ptr trusts DR_PTR_INFO.  But from what I can understand, it is just the points to info, and alignment info in there is solely for the base address, but not necessarily for the whole DR.
Richard, do you agree?  Now the question is what we can do here, if in all the spots in vect_create_data_ref_ptr we should just set it to unknown alignment, or if we can do better (and how).
Comment 13 Jakub Jelinek 2015-03-17 21:29:52 UTC
Created attachment 35047 [details]
gcc5-pr65450.patch

Fix that passed bootstrap/regtest on x86_64-linux and i686-linux.  It is I think conservatively correct, on the other side we don't have any ccp pass after vectorization that could recompute that info, only vrp2 which doesn't do zero bits computation, so perhaps we should try harder.
Comment 14 Jakub Jelinek 2015-03-18 09:00:29 UTC
Created attachment 35053 [details]
gcc5-pr65450.patch

Updated (but this time untested except on the testcase) patch, that tries to preserve the alignment info in DR_PTR_INFO from DR_MISALIGNMENT.
Will see if it is possible to reduce the testcase somewhat.
Comment 15 Jakub Jelinek 2015-03-18 10:49:02 UTC
Created attachment 35055 [details]
gcc5-pr65450.patch

And now with the reduced testcase, going to bootstrap/regtest it now.
Comment 16 Richard Biener 2015-03-18 11:33:32 UTC
I think DR_PTR_INFO is indeed for the base object and was supposed to preserve
points-to-info only.  Patch looks ok.
Comment 17 Jakub Jelinek 2015-03-18 13:54:43 UTC
Author: jakub
Date: Wed Mar 18 13:54:12 2015
New Revision: 221490

URL: https://gcc.gnu.org/viewcvs?rev=221490&root=gcc&view=rev
Log:
	PR tree-optimization/65450
	* tree-vect-data-refs.c (vect_duplicate_ssa_name_ptr_info): New
	function.
	(vect_create_addr_base_for_vector_ref, vect_create_data_ref_ptr): Use
	it instead of duplicate_ssa_name_ptr_info.

	* gfortran.dg/pr65450.f90: New test.

Added:
    trunk/gcc/testsuite/gfortran.dg/pr65450.f90
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-vect-data-refs.c
Comment 18 Jakub Jelinek 2015-03-18 13:56:33 UTC
Fixed on the trunk so far.
Comment 19 Jakub Jelinek 2015-06-03 15:28:15 UTC
Author: jakub
Date: Wed Jun  3 15:27:43 2015
New Revision: 224087

URL: https://gcc.gnu.org/viewcvs?rev=224087&root=gcc&view=rev
Log:
	Backported from mainline
	2015-03-18  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/65450
	* tree-vect-data-refs.c (vect_duplicate_ssa_name_ptr_info): New
	function.
	(vect_create_addr_base_for_vector_ref, vect_create_data_ref_ptr): Use
	it instead of duplicate_ssa_name_ptr_info.

	* gfortran.dg/pr65450.f90: New test.

Added:
    branches/gcc-4_9-branch/gcc/testsuite/gfortran.dg/pr65450.f90
Modified:
    branches/gcc-4_9-branch/gcc/ChangeLog
    branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_9-branch/gcc/tree-vect-data-refs.c
Comment 20 Jakub Jelinek 2015-06-03 21:43:17 UTC
Fixed for 4.9+.