42109 – stack alignment happens _before_ mcount "push %ebp ..." depending on -mtune flags

Bug 42109 - stack alignment happens _before_ mcount "push %ebp ..." depending on -mtune flags

Summary: stack alignment happens _before_ mcount "push %ebp ..." depending on -mtune f...

Status:	UNCONFIRMED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.4.0

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-11-19 18:26 UTC by Thomas Gleixner
Modified:	2009-11-20 04:00 UTC (History)
CC List:	2 users (show)

See Also:
Host:	i586-redhat-linux
Target:	i586-redhat-linux
Build:	i586-redhat-linux
Known to work:
Known to fail:
Last reconfirmed:

Attachments
source code (kernel/time/timer_stat.c) (3.71 KB, text/plain) 2009-11-19 18:27 UTC, Thomas Gleixner	Details
intermediate file timer_stats.i (119.06 KB, text/plain) 2009-11-19 18:28 UTC, Thomas Gleixner	Details
compiler command line (655 bytes, text/plain) 2009-11-19 18:30 UTC, Thomas Gleixner	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Thomas Gleixner 2009-11-19 18:26:00 UTC

Random Linux Kernel functions have 16 byte stack alignment at the start of the function. This stack alignment happens before the

  push %ebp
  mov  %esp, %ebp

sequence and breaks the kernel function graph tracer which needs to manipulate the return address. When the alignment happens then still 4(%ebp) contains the return address, but this is only a copy of the real stack entry which is used by the ret instruction. So the tracer modifies the copy and not the real return address stack entry.

There are two problems:

1) why is gcc doing 16 byte stack aligment at all
2) why is the stack alignment happening _before_ the "push %ebp, mov %esp %ebp" sequence.

Comment 1 Thomas Gleixner 2009-11-19 18:27:03 UTC

Created attachment 19057 [details]
source code (kernel/time/timer_stat.c)

Comment 2 Thomas Gleixner 2009-11-19 18:28:09 UTC

Created attachment 19058 [details]
intermediate file timer_stats.i

Comment 3 Thomas Gleixner 2009-11-19 18:30:43 UTC

Created attachment 19059 [details]
compiler command line

Comment 4 Andrew Pinski 2009-11-19 18:34:49 UTC

Is this really a bug since you have:
struct entry {
...
} __attribute__((__aligned__((1 << (4)))));

...

void timer_stats_update_stats(void *timer, pid_t pid, void *startf,
         void *timerf, char *comm,
         unsigned int timer_flag)
{
 spinlock_t *lock;
 struct entry *entry, input;


Since input is required to be 16byte aligned by the __aligned__ attribute on the struct.

Comment 5 Thomas Gleixner 2009-11-19 19:27:54 UTC

(In reply to comment #4)
> Is this really a bug since you have:
> struct entry {
> ...
> } __attribute__((__aligned__((1 << (4)))));
> 
> ...
> 
> void timer_stats_update_stats(void *timer, pid_t pid, void *startf,
>          void *timerf, char *comm,
>          unsigned int timer_flag)
> {
>  spinlock_t *lock;
>  struct entry *entry, input;
> 
> 
> Since input is required to be 16byte aligned by the __aligned__ attribute on
> the struct.

Yes, Andrew pointed that out in the LKML thread as well. This still does not explain why the mcount magic

  push %ebp
  mov  %esp, %ebp

happens _after_ the alignment and the stack layout assumption of mcount:

  return address
  saved ebp

is done via a copy of the return address instead of just keeping the

  push %ebp
  mov  %esp, %ebp

sequence right at the beginning of the function.

GCC 4.4.x silently changed this and we now need to figure out how to _NOT_ trip over that.

Comment 6 Thomas Gleixner 2009-11-20 00:52:32 UTC

I changed the summary to match the real problem.

Further info:

While testing various kernel configs we found out that the problem
comes and goes. Finally I started to compare the gcc command line
options and after some fiddling it turned out that the following
minimal deltas change the code generator behaviour:

Bad:  -march=pentium-mmx                -Wa,-mtune=generic32
Good: -march=i686        -mtune=generic -Wa,-mtune=generic32
Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32

The good ones produce:

650:   55                      push   %ebp
651:   89 e5                   mov    %esp,%ebp
653:   83 e4 f0                and    $0xfffffff0,%esp

The bad one:

000005f0 <timer_stats_update_stats>:
 5f0:   57                      push   %edi
 5f1:   8d 7c 24 08             lea    0x8(%esp),%edi
 5f5:   83 e4 f0                and    $0xfffffff0,%esp
 5f8:   ff 77 fc                pushl  -0x4(%edi)
 5fb:   55                      push   %ebp
 5fc:   89 e5                   mov    %esp,%ebp

It's worse code for no reason and breaks the kernel assumption of ebp + 4 
pointing to the real return address on the stack.

Comment 7 H.J. Lu 2009-11-20 04:00:23 UTC

(In reply to comment #6)
> The good ones produce:
> 
> 650:   55                      push   %ebp
> 651:   89 e5                   mov    %esp,%ebp
> 653:   83 e4 f0                and    $0xfffffff0,%esp
> 
> The bad one:
> 
> 000005f0 <timer_stats_update_stats>:
>  5f0:   57                      push   %edi
>  5f1:   8d 7c 24 08             lea    0x8(%esp),%edi
>  5f5:   83 e4 f0                and    $0xfffffff0,%esp
>  5f8:   ff 77 fc                pushl  -0x4(%edi)
>  5fb:   55                      push   %ebp
>  5fc:   89 e5                   mov    %esp,%ebp
> 
> It's worse code for no reason and breaks the kernel assumption of ebp + 4 
> pointing to the real return address on the stack.

I think the difference comes from DRAP:

  /* Nonzero if function being compiled needs dynamic realigned
     argument pointer (drap) if stack needs realigning.  */
  bool need_drap;

It may be triggered by -mno-accumulate-outgoing-args, alloca,
long jump, ...