Bug 16660 - attribute((aligned)) doesn't work for variables on the stack for greater than required alignement
Summary: attribute((aligned)) doesn't work for variables on the stack for greater than...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 3.4.1
: P2 normal
Target Milestone: 4.6.0
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords: diagnostic
: 24691 39373 (view as bug list)
Depends on: 33721
Blocks:
  Show dependency treegraph
 
Reported: 2004-07-21 20:43 UTC by Jens Maurer
Modified: 2021-12-23 05:57 UTC (History)
14 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-07-09 08:24:13


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jens Maurer 2004-07-21 20:43:49 UTC
Running the program below compiled with "-mpreferred-stack-boundary=2"
gets a "segmentation fault" because the variable "tmp"
is not properly aligned on a 16-byte boundary (required for
movaps), violating the aligned(16) request in the attribute.

void f()
{
  unsigned long tmp[4] __attribute__((aligned(16)));
  asm("movaps %%xmm0, (%0)" : : "r" (tmp) : "memory");
}

int main()
{
  f();
}
Comment 1 Andrew Pinski 2004-08-15 04:27:29 UTC
Confirmed, note that the inline-asm can be improved:
  asm("movaps %%xmm0, %0" : : "m" (*tmp) );
Comment 2 Andrew Pinski 2005-11-06 06:11:08 UTC
*** Bug 24691 has been marked as a duplicate of this bug. ***
Comment 3 Andrew Pinski 2006-03-18 17:43:49 UTC
Actually this is just a missed diagnostic.  The compiler cannot align the stack variables where the alignment is greater than stack alignment that the compiler can give for the stack.
Comment 4 Chris Lattner 2006-03-18 23:43:14 UTC
Huh?  Why can't it?
Comment 5 Thomas Weidenmueller 2006-09-06 07:14:42 UTC
(In reply to comment #3)
> Actually this is just a missed diagnostic.  The compiler cannot align the stack
> variables where the alignment is greater than stack alignment that the compiler
> can give for the stack.

The least GCC could and should do then is warn about it...

If the code is not very complex, the alignment appears to work, though. But as soon as the code becomes complex, GCC screwes the alignment and even
accesses variables that don't even exist (I'll go into detail later).

Basically code like this is affected (this is *NOT* a test case, I'm going to
post a test case as soon as I get it to work):

typedef struct _somestruct {
    int a;
};

static void checkstruct (volatile struct _somestruct *palignedvar)
{
    if ((size_t)palignedvar & 0xF)
        printf("structure misaligned!\n");
}

void somefunc(int a, int b, int c) {
    __attribute__((aligned (16))) volatile struct _somestruct alignedvar;

    while (1)
    {
        /* [other code] */
        if (a) {
            if (c) {
                /* [other code] */
                alignedvar.a = c;
                checkstruct(&alignedvar);
            } else {
                /* [other code] */
                break;
            }
        } else {
            if (b) {
                /* [other code] */
                alignedvar.a = a;
                checkstruct(&alignedvar);
            } else {
                if (c) {
                    break;
                } else {
                    /* [other code] */
                    alignedvar.a = a;
                    checkstruct(&alignedvar);
                }
            }
        }
        /* [other code] */
    }
}

I analyzed the generated assembly code. GCC reserves an area big enough to hold
the structure plus padding, so it can align the structure dynamically at
runtime. It stores a pointer to the reserved area and a pointer to the
structure within the area. As long as the code is simple, GCC uses the pointer
to the structure to access the data. However, if the code is complex enough,
GCC mistakenly uses the pointer to the reserved area - which of course is
sometimes not properly aligned. As a result, also the data of the structure
members are read/write incorrectly.

the stack is organized like this (the order may not match as showed in this
abstracted example):

struct {
    void *reserved_area;     /* this is the pointer GCC sometimes accidently
grabs */
    void *aligned_structure; /* this is the pointer GCC should always grab */

    char reserved[sizeof(structure) + sizeof(padding)];
};

I encountered this bug with -O3, I don't know if GCC also generates broken code
without optimizations. I tried to create a simple test case that triggers the
problem, but I failed. I'm going to do that in the next few days.
Comment 6 Andrew Pinski 2007-01-13 01:58:22 UTC
I am implementing something for this.
Comment 7 H.J. Lu 2007-10-03 22:04:21 UTC
What is the performance impact of

http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01167.html

Intel compiler has a very efficient way to align the stack:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28074

It saves stack pointer in frame pointer. Can we implement it for suitable
cases/backends and properly handle

1. tail call optimization.
2. stack unwind.
3. nested functions.


Comment 8 Andrew Pinski 2007-10-03 22:07:51 UTC
Subject: Re:  attribute((aligned)) doesn't work for variables on the stack for greater than required alignement

On 3 Oct 2007 22:04:28 -0000, hjl at lucon dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #7 from hjl at lucon dot org  2007-10-03 22:04 -------
> What is the performance impact of
>
> http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01167.html

The performance impact is non if the variables don't need aligned.
Otherwise you get a small penality at the very begining for the
alignment of the variable itself.  Really this is only to be used with
big alignments like 128byte alignment (for using with a DMA system
like in the Cell).

-- Pinski
Comment 9 H.J. Lu 2007-10-03 22:17:22 UTC
(In reply to comment #8)
> Subject: Re:  attribute((aligned)) doesn't work for variables on the stack for
> greater than required alignement
> 
> On 3 Oct 2007 22:04:28 -0000, hjl at lucon dot org
> <gcc-bugzilla@gcc.gnu.org> wrote:
> >
> >
> > ------- Comment #7 from hjl at lucon dot org  2007-10-03 22:04 -------
> > What is the performance impact of
> >
> > http://gcc.gnu.org/ml/gcc-patches/2007-05/msg01167.html
> 
> The performance impact is non if the variables don't need aligned.
> Otherwise you get a small penality at the very begining for the
> alignment of the variable itself.  Really this is only to be used with
> big alignments like 128byte alignment (for using with a DMA system
> like in the Cell).
> 

What is the performance if the stack alignment adjustment is required in
all functions with floating point variables on stack?

Comment 10 H.J. Lu 2007-10-03 22:27:39 UTC
For backend with frame pointer and working -fomit-frame-pointer -g, can
we

1. Make -fomit-frame-pointer per function, instead of per file.
2. Enable -fomit-frame-pointer for functions which need stack alignment.
3. Mark frame-pointer as reserved and use frame pointer to save stack
pointer.

and make sure that

1. tail call optimization.
2. stack unwind.
3. nested functions.
4. inline functions

work properly?
Comment 11 Andrew Pinski 2007-10-03 22:40:54 UTC
(In reply to comment #7)
> It saves stack pointer in frame pointer. Can we implement it for suitable
> cases/backends and properly handle

This only helps x86 really.  If you look at my patch, it already implements (correctly) handling large cases like 128byte alignment (which people use with the Cell).  What you are proposing will cause more stack to be used than actually required and more complex for the normal case.  If you look at my patch, you will see it handles 1-4 issues nicely without any problems (because the stack itself is not realigned).  Oh on PPC, the stack pointer has to be correct so you cannot use frame pointer to be the old stack pointer.
Comment 12 H.J. Lu 2007-10-04 00:01:56 UTC
(In reply to comment #11)

> This only helps x86 really.  If you look at my patch, it already implements
> (correctly) handling large cases like 128byte alignment (which people use with
> the Cell).  What you are proposing will cause more stack to be used than
> actually required and more complex for the normal case.  If you look at my
> patch, you will see it handles 1-4 issues nicely without any problems (because
> the stack itself is not realigned).  Oh on PPC, the stack pointer has to be
> correct so you cannot use frame pointer to be the old stack pointer.

Does your patch handle register spill which needs a larger alignment? What
is the impact of your approach on performance when stack alignment is needed
for local variable as well as register spill?

Comment 13 Andrew Pinski 2008-01-03 17:47:41 UTC
I am getting tried of pinging this patch, I guess if nobody wants to comment that is up to them.
Comment 14 H.J. Lu 2008-07-31 01:04:30 UTC
(In reply to comment #0)
> Running the program below compiled with "-mpreferred-stack-boundary=2"
> gets a "segmentation fault" because the variable "tmp"
> is not properly aligned on a 16-byte boundary (required for
> movaps), violating the aligned(16) request in the attribute.
> 
> void f()
> {
>   unsigned long tmp[4] __attribute__((aligned(16)));
>   asm("movaps %%xmm0, (%0)" : : "r" (tmp) : "memory");
> }
> 
> int main()
> {
>   f();
> }

This should work with gcc 4.4 revision 138335.
Comment 15 Andrew Pinski 2008-07-31 01:05:47 UTC
Subject: Re:  attribute((aligned)) doesn't work for variables on the stack for greater than required alignement

> This should work with gcc 4.4 revision 138335.

Only on x86 and not on any other target ...

-- Pinski
Comment 16 H.J. Lu 2009-03-05 14:02:59 UTC
*** Bug 39373 has been marked as a duplicate of this bug. ***
Comment 17 bernd_afa 2009-04-16 09:22:06 UTC
I get same align problem on 68k amigaos Target.the rport and fix is old.

its a middle end bug and i see the fix is not in the source i download (4.3.3)
i can test this patch if you like, or have you something more new ?  

Here is mail i get last in gcc ML

http://gcc.gnu.org/ml/gcc/2009-04/msg00395.html
Comment 18 David Conrad 2009-08-20 08:49:08 UTC
This still doesn't work on ARM either (tested with 4.4.0). The EABI only mandates the stack be 8 byte aligned, and gcc silently clips any alignment request above 8 bytes to 8 (so even if the stack were 16-byte aligned by accident the variables still wouldn't be.)

Even a simple sp -= sp & (align-1) for every function with variables needing more alignment would be faster than unaligned NEON loads/stores.
Comment 19 Andrew Pinski 2010-10-01 23:03:15 UTC
Fixed fully with http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00060.html .
Comment 20 Brooks Moses 2013-02-15 00:51:30 UTC
Andrew, if this was "fixed fully" with that patch, why is it still open?
Comment 21 Andrew Pinski 2013-02-15 01:06:40 UTC
(In reply to comment #20)
> Andrew, if this was "fixed fully" with that patch, why is it still open?

Because I pointed to that patch before it was applied.  It was applied 8 days after I pointed to it and nobody has got around to closing the bug as being fixed.
Comment 22 Brooks Moses 2013-02-15 01:11:29 UTC
(In reply to comment #21)

Makes sense.  Thanks for having a look!  :)
Comment 23 Thomas Martitz 2014-02-05 08:40:48 UTC
Which release is the first to ship and was it backported? I experience this bug in an old 4.4.6 armeabi gcc.
Comment 24 Thomas Martitz 2014-02-05 08:54:09 UTC
Alright, looking at the revision history it appears to me that the GCC 4.6 series ship the fix and it was not backported to 4.4 (although 4.4.7 was released months after this patch hit svn trunk) or 4.5.
Comment 25 Jackie Rosen 2014-02-16 10:02:50 UTC Comment hidden (spam)
Comment 26 Andrew Pinski 2021-12-23 05:57:39 UTC
Fixed as mentioned.