Running the program below compiled with "-mpreferred-stack-boundary=2" gets a "segmentation fault" because the variable "tmp" is not properly aligned on a 16-byte boundary (required for movaps), violating the aligned(16) request in the attribute. void f() { unsigned long tmp[4] __attribute__((aligned(16))); asm("movaps %%xmm0, (%0)" : : "r" (tmp) : "memory"); } int main() { f(); }
Confirmed, note that the inline-asm can be improved: asm("movaps %%xmm0, %0" : : "m" (*tmp) );
*** Bug 24691 has been marked as a duplicate of this bug. ***
Actually this is just a missed diagnostic. The compiler cannot align the stack variables where the alignment is greater than stack alignment that the compiler can give for the stack.
Huh? Why can't it?
(In reply to comment #3) > Actually this is just a missed diagnostic. The compiler cannot align the stack > variables where the alignment is greater than stack alignment that the compiler > can give for the stack. The least GCC could and should do then is warn about it... If the code is not very complex, the alignment appears to work, though. But as soon as the code becomes complex, GCC screwes the alignment and even accesses variables that don't even exist (I'll go into detail later). Basically code like this is affected (this is *NOT* a test case, I'm going to post a test case as soon as I get it to work): typedef struct _somestruct { int a; }; static void checkstruct (volatile struct _somestruct *palignedvar) { if ((size_t)palignedvar & 0xF) printf("structure misaligned!\n"); } void somefunc(int a, int b, int c) { __attribute__((aligned (16))) volatile struct _somestruct alignedvar; while (1) { /* [other code] */ if (a) { if (c) { /* [other code] */ alignedvar.a = c; checkstruct(&alignedvar); } else { /* [other code] */ break; } } else { if (b) { /* [other code] */ alignedvar.a = a; checkstruct(&alignedvar); } else { if (c) { break; } else { /* [other code] */ alignedvar.a = a; checkstruct(&alignedvar); } } } /* [other code] */ } } I analyzed the generated assembly code. GCC reserves an area big enough to hold the structure plus padding, so it can align the structure dynamically at runtime. It stores a pointer to the reserved area and a pointer to the structure within the area. As long as the code is simple, GCC uses the pointer to the structure to access the data. However, if the code is complex enough, GCC mistakenly uses the pointer to the reserved area - which of course is sometimes not properly aligned. As a result, also the data of the structure members are read/write incorrectly. the stack is organized like this (the order may not match as showed in this abstracted example): struct { void *reserved_area; /* this is the pointer GCC sometimes accidently grabs */ void *aligned_structure; /* this is the pointer GCC should always grab */ char reserved[sizeof(structure) + sizeof(padding)]; }; I encountered this bug with -O3, I don't know if GCC also generates broken code without optimizations. I tried to create a simple test case that triggers the problem, but I failed. I'm going to do that in the next few days.
I am implementing something for this.
What is the performance impact of http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01167.html Intel compiler has a very efficient way to align the stack: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28074 It saves stack pointer in frame pointer. Can we implement it for suitable cases/backends and properly handle 1. tail call optimization. 2. stack unwind. 3. nested functions.
Subject: Re: attribute((aligned)) doesn't work for variables on the stack for greater than required alignement On 3 Oct 2007 22:04:28 -0000, hjl at lucon dot org <gcc-bugzilla@gcc.gnu.org> wrote: > > > ------- Comment #7 from hjl at lucon dot org 2007-10-03 22:04 ------- > What is the performance impact of > > http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01167.html The performance impact is non if the variables don't need aligned. Otherwise you get a small penality at the very begining for the alignment of the variable itself. Really this is only to be used with big alignments like 128byte alignment (for using with a DMA system like in the Cell). -- Pinski
(In reply to comment #8) > Subject: Re: attribute((aligned)) doesn't work for variables on the stack for > greater than required alignement > > On 3 Oct 2007 22:04:28 -0000, hjl at lucon dot org > <gcc-bugzilla@gcc.gnu.org> wrote: > > > > > > ------- Comment #7 from hjl at lucon dot org 2007-10-03 22:04 ------- > > What is the performance impact of > > > > http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01167.html > > The performance impact is non if the variables don't need aligned. > Otherwise you get a small penality at the very begining for the > alignment of the variable itself. Really this is only to be used with > big alignments like 128byte alignment (for using with a DMA system > like in the Cell). > What is the performance if the stack alignment adjustment is required in all functions with floating point variables on stack?
For backend with frame pointer and working -fomit-frame-pointer -g, can we 1. Make -fomit-frame-pointer per function, instead of per file. 2. Enable -fomit-frame-pointer for functions which need stack alignment. 3. Mark frame-pointer as reserved and use frame pointer to save stack pointer. and make sure that 1. tail call optimization. 2. stack unwind. 3. nested functions. 4. inline functions work properly?
(In reply to comment #7) > It saves stack pointer in frame pointer. Can we implement it for suitable > cases/backends and properly handle This only helps x86 really. If you look at my patch, it already implements (correctly) handling large cases like 128byte alignment (which people use with the Cell). What you are proposing will cause more stack to be used than actually required and more complex for the normal case. If you look at my patch, you will see it handles 1-4 issues nicely without any problems (because the stack itself is not realigned). Oh on PPC, the stack pointer has to be correct so you cannot use frame pointer to be the old stack pointer.
(In reply to comment #11) > This only helps x86 really. If you look at my patch, it already implements > (correctly) handling large cases like 128byte alignment (which people use with > the Cell). What you are proposing will cause more stack to be used than > actually required and more complex for the normal case. If you look at my > patch, you will see it handles 1-4 issues nicely without any problems (because > the stack itself is not realigned). Oh on PPC, the stack pointer has to be > correct so you cannot use frame pointer to be the old stack pointer. Does your patch handle register spill which needs a larger alignment? What is the impact of your approach on performance when stack alignment is needed for local variable as well as register spill?
I am getting tried of pinging this patch, I guess if nobody wants to comment that is up to them.
(In reply to comment #0) > Running the program below compiled with "-mpreferred-stack-boundary=2" > gets a "segmentation fault" because the variable "tmp" > is not properly aligned on a 16-byte boundary (required for > movaps), violating the aligned(16) request in the attribute. > > void f() > { > unsigned long tmp[4] __attribute__((aligned(16))); > asm("movaps %%xmm0, (%0)" : : "r" (tmp) : "memory"); > } > > int main() > { > f(); > } This should work with gcc 4.4 revision 138335.
Subject: Re: attribute((aligned)) doesn't work for variables on the stack for greater than required alignement > This should work with gcc 4.4 revision 138335. Only on x86 and not on any other target ... -- Pinski
*** Bug 39373 has been marked as a duplicate of this bug. ***
I get same align problem on 68k amigaos Target.the rport and fix is old. its a middle end bug and i see the fix is not in the source i download (4.3.3) i can test this patch if you like, or have you something more new ? Here is mail i get last in gcc ML http://gcc.gnu.org/ml/gcc/2009-04/msg00395.html
This still doesn't work on ARM either (tested with 4.4.0). The EABI only mandates the stack be 8 byte aligned, and gcc silently clips any alignment request above 8 bytes to 8 (so even if the stack were 16-byte aligned by accident the variables still wouldn't be.) Even a simple sp -= sp & (align-1) for every function with variables needing more alignment would be faster than unaligned NEON loads/stores.
Fixed fully with http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00060.html .
Andrew, if this was "fixed fully" with that patch, why is it still open?
(In reply to comment #20) > Andrew, if this was "fixed fully" with that patch, why is it still open? Because I pointed to that patch before it was applied. It was applied 8 days after I pointed to it and nobody has got around to closing the bug as being fixed.
(In reply to comment #21) Makes sense. Thanks for having a look! :)
Which release is the first to ship and was it backported? I experience this bug in an old 4.4.6 armeabi gcc.
Alright, looking at the revision history it appears to me that the GCC 4.6 series ship the fix and it was not backported to 4.4 (although 4.4.7 was released months after this patch hit svn trunk) or 4.5.
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Marked for reference. Resolved as fixed @bugzilla.
Fixed as mentioned.