Bug 61904 - Incorrect stack red-zoning on x86-64 code generation
Incorrect stack red-zoning on x86-64 code generation
Status: RESOLVED DUPLICATE of bug 61801
Product: gcc
Classification: Unclassified
Component: target
4.9.0
: P3 normal
: ---
Assigned To: Not yet assigned to anyone
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-25 08:09 UTC by Linus Torvalds
Modified: 2014-08-06 21:06 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-07-25 00:00:00


Attachments
Buggy result of compilation (183.30 KB, application/octet-stream)
2014-07-25 08:11 UTC, Linus Torvalds
Details
gzipped pre-processed fair.i file (207.83 KB, application/gzip)
2014-07-25 08:18 UTC, Linus Torvalds
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Linus Torvalds 2014-07-25 08:09:04 UTC
gcc-4.9.0 in Debian seems to miscompile the linux kernel for x86-64 in certain configurations, creating accesses to below the stack pointer even though the kernel uses -mno-red-zone.

The kernel cannot use the x86-64 stack red-zoning, because the hardware only switches stacks on privilege transfers, so interrupts that happen in kernel mode will not honor the normal 128-byte stack red-zone.

Attached is the pre-processed C code of the current kernel file

   kernel/sched/fair.c

which apparently on gcc-4.9.0 will miscompile the function "load_balance()", creating code like this:

load_balance:
.LFB2408:
        .loc 2 6487 0
        .cfi_startproc
.LVL1355:
        pushq   %rbp    #
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp      #,
        .cfi_def_cfa_register 6
        pushq   %r15    #
        pushq   %r14    #
        pushq   %r13    #
        pushq   %r12    #
        .cfi_offset 15, -24
        .cfi_offset 14, -32
        .cfi_offset 13, -40
        .cfi_offset 12, -48
        movq    %rdx, %r12      # sd, sd
        pushq   %rbx    #
.LBB2877:
        .loc 2 6493 0
        movq    $load_balance_mask, -136(%rbp)  #, %sfp
.LBE2877:
        .loc 2 6487 0
        subq    $184, %rsp      #,
        .cfi_offset 3, -56
        .loc 2 6489 0
     ....


Note the "subq    $184, %rsp" *after* the compiler has already spilled to the stack (the spill is insane, btw, since it's spilling a constant value!)

The second attachement is the reported mis-compiled result. I don't personally have the affected gcc version, but you can see the options passed into the compiler in the resulting "fair.s" file. The "-Os" in particular seems to be important, with the bug not happening with "-O2".
Comment 1 Linus Torvalds 2014-07-25 08:11:31 UTC
Created attachment 33183 [details]
Buggy result of compilation

This was sent by the reporter of the kernel problem, Michel Dänzer <michel@daenzer.net>, at my request.

It's gzipped to fit the file size limit.
Comment 2 Markus Trippelsdorf 2014-07-25 08:17:11 UTC
The testcase is missing. Please attach fair.i.
Comment 3 Linus Torvalds 2014-07-25 08:18:08 UTC
Created attachment 33184 [details]
gzipped pre-processed fair.i file

Apparently attaching a file during the initial bug entry creation failed, probably because it failed the size check. So here's the fair.i file.
Comment 4 Linus Torvalds 2014-07-25 08:23:57 UTC
As I already mentioned to Jakub Jelinek separately in the original email thread about the kernel problem:

 "Note that I don't personally have a reproducer (my machine has
  gcc-4.8.3, and I don't see the same behavior), but I included the
  incorrect fair.s file that Michel sent me (which has all the command
  line options in it), and a pre-processed "fair.i" source file that I
  generated and that *should* match the configuration that was the
  source for that result. So there might be some version/configuration
  skew there between the two files, but I think they match.

  Holler if you cannot reproduce the problem based on that."

so if the attached *.i and resulting buggy *.s files don't match up perfectly, that's the explanation.
Comment 5 Richard Biener 2014-07-25 09:26:55 UTC
With both the FSF 4.9.0 and 4.9.1 releases and

gcc -S t.i -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mtune=generic -mno-red-zone -mcmodel=kernel -maccumulate-outgoing-args -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -march=x86-64 -g -Os -fno-strict-aliasing -fno-common -fno-asynchronous-unwind-tables -fno-omit-frame-pointer

(too many options only to guess from the .s file - can you please specify
explicitely passed options?)


load_balance:
.LFB2409:
        .loc 2 6487 0
.LVL1260:
        pushq   %rbp
.LCFI442:
        movq    %rsp, %rbp
.LCFI443:
        pushq   %r15
        pushq   %r14
        pushq   %r13
        pushq   %r12
.LCFI444:
        movq    %rdx, %r12
        pushq   %rbx
.LBB2812:
        .loc 2 6493 0
        movq    $load_balance_mask, -376(%rbp)
.LBE2812:
        .loc 2 6487 0
        subq    $416, %rsp
.LCFI445:
        .loc 2 6489 0
        movq    (%rdx), %rax
        .loc 2 6487 0
        movl    %edi, -392(%rbp)
        movl    %ecx, -388(%rbp)
        movq    %r8, -456(%rbp)
        .loc 2 6489 0
        movq    %rax, -424(%rbp)
.LVL1261:
.LBB2813:
        .loc 2 6493 0
        movq    -376(%rbp), %rax
.LVL1262:
#APP
# 6493 "kernel/sched/fair.c" 1
        add %gs:this_cpu_off, %rax
# 0 "" 2
#NO_APP


which I guess is equally bad (even if not matching the output from Michel).
It seems to be triggered by -g or -fvar-tracking and is fixed on the 4.9
branch - so it might be a duplicate of PR61801.
Comment 6 Richard Biener 2014-07-25 09:33:50 UTC
Reduced set of options to reproduce with 4.9.1 are

-mno-red-zone -mcmodel=kernel -maccumulate-outgoing-args -mno-sse -g  -Os -fno-omit-frame-pointer

if you use -fno-var-tracking the bug goes away, similar if you update to
the 4.9 branch head.

If you use -fno-schedule-insns2 you see a similar pattern to the mentioned PR:

        subq    $424, %rsp
.LCFI427:
        movl    %edi, -400(%rbp)
        movq    %rdx, %rbx
        movl    %ecx, -396(%rbp)
        movq    %r8, -464(%rbp)
.LVL1306:
        .loc 2 6489 0
        movq    (%rdx), %rax
        movq    %rax, -440(%rbp)
.LVL1307:
.LBB2616:
        .loc 2 6493 0
        movq    $load_balance_mask, -384(%rbp)
        movq    -384(%rbp), %rax
.LVL1308:
#APP
# 6493 "kernel/sched/fair.c" 1
        add %gs:this_cpu_off, %rax

so that spill is right before the asm and we immediately re-load it into
a register required by an asm constraint (stupid register allocator).

*** This bug has been marked as a duplicate of bug 61801 ***
Comment 7 Linus Torvalds 2014-07-25 18:12:36 UTC
Just for completeness for the original bug report, here's the actual compiler command line that the kernel uses to generate the *.s file.

  gcc -Wp,-MD,kernel/sched/.fair.s.d  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.8.3/include -I./arch/x86/include -Iarch/x86/include/generated  -Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I./include/uapi -Iinclude/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -m64 -mno-mmx -mno-sse -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fno-delete-null-pointer-checks -Os -Wno-maybe-uninitialized -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -DCC_HAVE_ASM_GOTO    -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(fair)"  -D"KBUILD_MODNAME=KBUILD_STR(fair)"  -fverbose-asm -S -o kernel/sched/fair.s kernel/sched/fair.c

although looking at the generated .s file in the attachment, I really think it's all there too in the comments at the top of the file.
Comment 8 Linus Torvalds 2014-07-25 18:19:42 UTC
Oh, and this is marked as a duplicate of 61801, but that one is marked to be in gcc-4.8.3

The particular problem we see in kernel code generation seems to *not* happen in 4.8.3 (current fedora 20), and happens with Debian 4.9.0.

So now I worry about

 (a) whether the duplicate bug is really true

 (b) or perhaps 4.8.3 really does have the same problem and we might get bitten there too, and it just happens to trigger on 4.9.0 for some otherwise unrelated reason.

I'd like to have some way to tell known-bad compilers, so that we know to avoid them..
Comment 9 Andrew Pinski 2014-07-25 18:42:07 UTC
(In reply to Linus Torvalds from comment #8)
> Oh, and this is marked as a duplicate of 61801, but that one is marked to be
> in gcc-4.8.3
> 
> The particular problem we see in kernel code generation seems to *not*
> happen in 4.8.3 (current fedora 20), and happens with Debian 4.9.0.

Different optimization happen before the scheduler between 4.8.3 and 4.9.0 which causes the difference.

> 
> So now I worry about
> 
>  (a) whether the duplicate bug is really true

It is as the true issue was fixed there.

> 
>  (b) or perhaps 4.8.3 really does have the same problem and we might get
> bitten there too, and it just happens to trigger on 4.9.0 for some otherwise
> unrelated reason.

Correct.

> 
> I'd like to have some way to tell known-bad compilers, so that we know to
> avoid them..

The bad compiler versions are 4.5.0 (when debug_insn came in) to 4.8.3 and 4.9.0 and 4.9.1.
Comment 10 Linus Torvalds 2014-07-25 19:01:36 UTC
(In reply to Andrew Pinski from comment #9)
> 
> The bad compiler versions are 4.5.0 (when debug_insn came in) to 4.8.3 and
> 4.9.0 and 4.9.1.

Ok, so we have no reasonable way of avoiding the problem compiler version. I had hoped that we could just do a warning if people use 4.9.0/1, since they aren't very common yet.

Ugh. We had one suggestion of having some post-compile checking pass, but that one was (so far) just handwaving ("objdump + perl-script"). It doesn't sound very pleasant.

The problem is that these things are a bitch to debug - they turn into these completely impossible kernel oopses or corruption, and we were just very lucky that this one case happened to be repeatable and pinpoint for two people. Are there others? We have no way of knowing..

Anyway, thanks for the quick resolution, even if I'm now rather nervous about existing compilers..
Comment 11 Markus Trippelsdorf 2014-07-26 10:02:14 UTC
Here's a small (auto-)reduced testcase for this specific issue:

markus@x4 ~ % cat fair.i
int a, b, d;
enum cpu_idle_type
{
  CPU_MAX_IDLE_TYPES
};
struct sched_domain
{
  int balance_interval;
  int lb_count[0];
  int lb_nobusyg[];
} c,*g ;
__typeof__(int[0]) f;
int *h;
static struct sched_domain *fn1 (int *p1, int *p2)
{
  struct sched_domain *e;
  int i;
  for (; i = fn2 (i);)
    {
      e = &*(
               {

                 __asm__("" : "=r"(d) : "0"(0));
                 (typeof(&c))0 + a;

               });
      fn3 ();
      fn4 (i);
    }
  return e;
}

fn5 (int p1, struct sched_domain *p2, enum cpu_idle_type p3)
{
  int *j = *(
               {
                 long k;
                 asm volatile("" : "=r"(k) : ""(b), "0"(&f));
                 (typeof(*&f) *)k;
               });
  if (p3)
    h = 0;
  int l = *j;
  p2->lb_count[p3]++;
redo:
  p2->lb_nobusyg[p3]++;
  g = fn1 (0, 0);
  if (fn6 (j))
    goto redo;
  p2->balance_interval *= 2;
}

markus@x4 ~ % gcc -c -mcmodel=kernel -Os -fno-omit-frame-pointer -fcompare-debug fair.i
gcc: error: fair.i: -fcompare-debug failure (length)
markus@x4 ~ % gcc -c -mcmodel=kernel -O2 -fno-omit-frame-pointer -fcompare-debug fair.i
markus@x4 ~ %