Bug 52574 - [4.6 Regression] gcc tree optimizer generates incorrect vector load instructions for x86_64, app crashes
Summary: [4.6 Regression] gcc tree optimizer generates incorrect vector load instructi...
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.6.3
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2012-03-13 01:19 UTC by Matthias Klose
Modified: 2012-03-13 19:18 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work: 4.5.3, 4.7.0
Known to fail: 4.6.3
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Matthias Klose 2012-03-13 01:19:27 UTC
[forwarded from http://bugs.debian.org/663654]

The following versions of gcc:
 Debian gcc-4.6.3-1,
 Debain gcc-4.4.6-14,
 Debian gcc-4.6.2-14,
 Debian gcc-4.4.6-15,
 Ubuntu 4.4.3-4ubuntu5
generates *wrong* code - aligned vector loads instead of unaligned vector loads
for x86_64 arch. This causes the compiled code to crash with
SIGSEGV(General Protection Fault).

Bug *not* present on trunk and gcc-4.5.3-12.

Consider the following program:

        void foo(int* __restrict ia, int n){
          int i;
          for(i=0;i<n;i++){
            ia[i]=ia[i]*ia[i];
          }
        }

        int main(){
          int a[9];
          int sum=0,i;
          for(i=0;i<9;i++){
            a[i]=(i*i)%128;
          }

          foo((int*)((char*)a+2), 8);

          for(i=0;i<9;i++){
            sum+=a[i];
          }
          return sum;
        }

In x86 and x86_64, unaligned word access are valid
  - *((int*)<unaligned memory address>)
But x86_64 SSE has two kinds of vector instructions
  - aligned vector move (movdqa)
  - unaligned vector move (movdqu)
Use of aligned vector move with an unaligned vector address,
will trigger the application to crash.


When compiled with any of the following command lines:
  gcc -O3 foo.c
  g++ -O3 foo.c
  gcc -m64 -O2 -ftree-vectorize gcc_bug.c
  g++ -m64 -O2 -ftree-vectorize gcc_bug.c
gcc generates an aligned vector load
  movdqa  -54(%rsp,%rax), %xmm0
instead of unaligned vector load - movdqu.

This result in above application to crash with
SIGSEGV(General Protection Fault).

gcc-4.7 correctly generates
    movdqu  -54(%rsp), %xmm0
Comment 1 Jakub Jelinek 2012-03-13 07:12:55 UTC
The testcase is invalid C, while x86_64/i?86 will do the expected thing of doing unaligned loads/stores silently, it won't do that in vectorized code or for atomic accesses.  You need to tell the compiler that ia isn't aligned through aligned attribute.  E.g. typedef int T __attribute__((aligned (2)));
and using T *__restrict ia instead of int *__restrict ia.
Comment 2 Deepak Ravi 2012-03-13 19:18:45 UTC
(In reply to comment #1)
> The testcase is invalid C, while x86_64/i?86 will do the expected thing of
> doing unaligned loads/stores silently, it won't do that in vectorized code or
> for atomic accesses. 

Shouldn't the compiler vectorize the code _conservatively_, by generating code to check if the address is aligned or generating unaligned vector load instructions, as any code written for x86_64 will break with -O3, with newer gcc. 

Also note that, this bug will get triggered only when __restricted is used. If you remove __restricted, gcc is generating proper code. Also it works properly for gcc 4.7 too (even with __restricted).