52574 – [4.6 Regression] gcc tree optimizer generates incorrect vector load instructions for x86_64, app crashes

Bug 52574 - [4.6 Regression] gcc tree optimizer generates incorrect vector load instructions for x86_64, app crashes

Summary: [4.6 Regression] gcc tree optimizer generates incorrect vector load instructi...

Status:	RESOLVED INVALID

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.6.3

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2012-03-13 01:19 UTC by Matthias Klose
Modified:	2012-03-13 19:18 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:	x86_64-linux-gnu
Build:
Known to work:	4.5.3, 4.7.0
Known to fail:	4.6.3
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Matthias Klose 2012-03-13 01:19:27 UTC

[forwarded from http://bugs.debian.org/663654]

The following versions of gcc:
 Debian gcc-4.6.3-1,
 Debain gcc-4.4.6-14,
 Debian gcc-4.6.2-14,
 Debian gcc-4.4.6-15,
 Ubuntu 4.4.3-4ubuntu5
generates *wrong* code - aligned vector loads instead of unaligned vector loads
for x86_64 arch. This causes the compiled code to crash with
SIGSEGV(General Protection Fault).

Bug *not* present on trunk and gcc-4.5.3-12.

Consider the following program:

        void foo(int* __restrict ia, int n){
          int i;
          for(i=0;i<n;i++){
            ia[i]=ia[i]*ia[i];
          }
        }

        int main(){
          int a[9];
          int sum=0,i;
          for(i=0;i<9;i++){
            a[i]=(i*i)%128;
          }

          foo((int*)((char*)a+2), 8);

          for(i=0;i<9;i++){
            sum+=a[i];
          }
          return sum;
        }

In x86 and x86_64, unaligned word access are valid
  - *((int*)<unaligned memory address>)
But x86_64 SSE has two kinds of vector instructions
  - aligned vector move (movdqa)
  - unaligned vector move (movdqu)
Use of aligned vector move with an unaligned vector address,
will trigger the application to crash.


When compiled with any of the following command lines:
  gcc -O3 foo.c
  g++ -O3 foo.c
  gcc -m64 -O2 -ftree-vectorize gcc_bug.c
  g++ -m64 -O2 -ftree-vectorize gcc_bug.c
gcc generates an aligned vector load
  movdqa  -54(%rsp,%rax), %xmm0
instead of unaligned vector load - movdqu.

This result in above application to crash with
SIGSEGV(General Protection Fault).

gcc-4.7 correctly generates
    movdqu  -54(%rsp), %xmm0

Comment 1 Jakub Jelinek 2012-03-13 07:12:55 UTC

The testcase is invalid C, while x86_64/i?86 will do the expected thing of doing unaligned loads/stores silently, it won't do that in vectorized code or for atomic accesses.  You need to tell the compiler that ia isn't aligned through aligned attribute.  E.g. typedef int T __attribute__((aligned (2)));
and using T *__restrict ia instead of int *__restrict ia.

Comment 2 Deepak Ravi 2012-03-13 19:18:45 UTC

(In reply to comment #1)
> The testcase is invalid C, while x86_64/i?86 will do the expected thing of
> doing unaligned loads/stores silently, it won't do that in vectorized code or
> for atomic accesses. 

Shouldn't the compiler vectorize the code _conservatively_, by generating code to check if the address is aligned or generating unaligned vector load instructions, as any code written for x86_64 will break with -O3, with newer gcc. 

Also note that, this bug will get triggered only when __restricted is used. If you remove __restricted, gcc is generating proper code. Also it works properly for gcc 4.7 too (even with __restricted).