Bug 24644

Summary: [4.1 Regression] gcc-4.1 compiled ppc64 kernels do not boot
Product: gcc Reporter: olh
Component: cAssignee: Andrew Pinski <pinskia>
Status: RESOLVED FIXED    
Severity: critical CC: bergner, dje, gcc-bugs, hp, ian, pinskia, rguenth
Priority: P3 Keywords: alias, patch, wrong-code
Version: 4.1.0   
Target Milestone: 4.1.0   
URL: http://gcc.gnu.org/ml/gcc-patches/2005-11/msg00353.html
Host: Target:
Build: Known to work:
Known to fail: Last reconfirmed: 2005-11-04 05:48:27
Attachments: PR24644.tar.bz2
PR24644-2.tar.bz2
PR24644-3.tar.bz2
PR24644-4.tar.bz2
Patch to try

Description olh 2005-11-02 20:40:49 UTC
I do not have a reduced testcase, nor do I know what the real cuplprit is.
kernel compiled with gcc-4_0-branch boot ok, but gcc-mainline fails to boot.
Linux kernel 2.6.14, arch/ppc64/defconfig, do not boot on POWER5.
They hang early in SMP init because a spinlock was unlocked twice.
Enabling CONFIG_DEBUG_SPINLOCK and adding a few extra printk will show this:


....
Linux version 2.6.14 (olaf@pomegranate) (gcc version 4.1.0 20051102 (experimental)) #61 SMP Wed Nov 2 19:47:05 UTC 2005
....
Memory: 495732k/524288k available (5004k kernel code, 27528k reserved, 1340k data, 2394k bss, 380k init)
374.78 BogoMIPS (lpj=749568)
Mount-cache hash table entries: 256
softlockup thread 0 started up.
Processor 1 found.
Brought up 2 CPUs
BUG: spinlock already unlocked on CPU#1, migration/1/5
c0 .schedule+0x3fc/0x870c1 .kthread+0xf0/0x170 lock: c0000000142a6100, .magic: dead4ead, .owner: swapper/1,c0000000004dbc0c,c00000000007b9c0 .owner_cpu: -1
Call Trace:
[c000000000e33c30] [c00000000021d2ec] .spin_bug+0xcc/0x150 (unreliable)
[c000000000e33cc0] [c00000000021d444] ._raw_spin_unlock+0xd4/0xe0
[c000000000e33d50] [c0000000004ddfa4] ._spin_unlock_irq+0xa4/0xe0
[c000000000e33de0] [c0000000004dbc0c] .schedule+0x3fc/0x870
[c000000000e33ee0] [c00000000007b9c0] .kthread+0xf0/0x170
[c000000000e33f90] [c000000000010948] .kernel_thread+0x4c/0x68
kernel BUG in spin_bug at /home/olaf/kernel/olh/gcc41/linux-2.6.14-olh/lib/spinlock_debug.c:34!
cpu 0x1: Vector: 700 (Program Check) at [c000000000e339b0]
pc: c00000000021d2f0: .spin_bug+0xd0/0x150
lr: c00000000021d2ec: .spin_bug+0xcc/0x150
sp: c000000000e33c30
msr: 8000000000021032
current = 0xc00000000ffd2040
paca    = 0xc000000000619000
pid   = 5, comm = migration/1
kernel BUG in spin_bug at /home/olaf/kernel/olh/gcc41/linux-2.6.14-olh/lib/spinlock_debug.c:34!
enter ? for help
1:mon>


In this case, spinlock at c0000000142a6100 is used correctly in a schedule_tail() and schedule()
call. Later, schedule() starts to use a spinlock at ..142ae100, then suddenly calls the unlock
with ...142a6100.
I currently do not know if the code in kernel/schedule.c is miscompiled, or if the breakage is
somewhere else.
If I replace the schedule.o with a gcc40 compiled version, things get further, but die later
during kernel init due to wrong pointer dereferencing. Perhaps the object files generated
with gcc-4_0-branch and gcc-mainline are somewhat incompatible.

<7>_spin_lock_irq(85) swapper(1):c0,j4294892318 l 13ff8f28
<7>_spin_unlock_irq(292) swapper(1):c0,j4294892318 l 13ff8f28
<7>_spin_lock_irq(85) swapper(1):c0,j4294892318 l 13ff8e90
<7>_spin_unlock_irq(292) swapper(1):c0,j4294892318 l 13ff8e90
<7>_spin_lock_irq(85) swapper(1):c0,j4294892318 l 00e17be8
<7>_spin_unlock_irq(292) swapper(1):c0,j4294892318 l 00e17be8
<7>_spin_lock_irq(85) swapper(1):c0,j4294892318 l 142a6100
<7>schedule_tail(1552) swapper(6):c0,j4294892318 c0000000142a6100
<7>_spin_unlock_irq(292) swapper(6):c0,j4294892318 l 142a6100
<7>_spin_lock_irq(85) swapper(6):c0,j4294892318 l 00e2c8d0
<7>_spin_unlock_irq(292) swapper(6):c0,j4294892318 l 00e2c8d0
<7>_spin_lock_irq(85) swapper(6):c0,j4294892318 l 142a6100
<7>schedule(3019) swapper(1):c0,j4294892318 c0000000142a6100
<7>_spin_unlock_irq(292) swapper(1):c0,j4294892318 l 142a6100
<7>_spin_lock_irq(85) swapper(1):c0,j4294892318 l 00e17be8
<7>_spin_unlock_irq(292) swapper(1):c0,j4294892318 l 00e17be8
<7>_spin_lock_irq(85) swapper(1):c0,j4294892318 l 00e17bf8
<7>_spin_unlock_irq(292) swapper(1):c0,j4294892318 l 00e17bf8
<7>_spin_lock_irq(85) swapper(1):c0,j4294892318 l 142a6100
<7>schedule_tail(1552) swapper(7):c0,j4294892318 c0000000142a6100
<7>_spin_unlock_irq(292) swapper(7):c0,j4294892318 l 142a6100
<7>_spin_lock_irq(85) swapper(7):c0,j4294892318 l 00e2f9e0
<7>_spin_unlock_irq(292) swapper(7):c0,j4294892318 l 00e2f9e0
<7>_spin_lock_irq(85) swapper(7):c0,j4294892318 l 142a6100
<7>schedule(3019) swapper(1):c0,j4294892318 c0000000142a6100
<7>_spin_unlock_irq(292) swapper(1):c0,j4294892318 l 142a6100
<7>_spin_lock_irq(85) swapper(1):c0,j4294892318 l 00e17bf8
<7>_spin_unlock_irq(292) swapper(1):c0,j4294892318 l 00e17bf8
<4>Processor 1 found.
<7>_spin_lock_irq(85) swapper(0):c1,j4294892318 l 142ae100
<6>Brought up 2 CPUs
<7>schedule(3019) swapper(1):c0,j4294892318 c0000000142a6100
<7>_spin_unlock_irq(292) migration/1(5):c1,j4294892318 l 142a6100
<4>BUG: spinlock already unlocked on CPU#1, migration/1/5
<4>c0 .schedule+0x3fc/0x870c1 .kthread+0xf0/0x170 lock: c0000000142a6100, .magic: dead4ead, .owner: swapper/1,c0000000004dbc0c,c00000000007b9c0 .owner_cpu: -1
<4>Call Trace:
<4>[c000000000e33c30] [c00000000021d2ec] .spin_bug+0xcc/0x150 (unreliable)
<4>[c000000000e33cc0] [c00000000021d444] ._raw_spin_unlock+0xd4/0xe0
<4>[c000000000e33d50] [c0000000004ddfa4] ._spin_unlock_irq+0xa4/0xe0
<4>[c000000000e33de0] [c0000000004dbc0c] .schedule+0x3fc/0x870
<4>[c000000000e33ee0] [c00000000007b9c0] .kthread+0xf0/0x170
<4>[c000000000e33f90] [c000000000010948] .kernel_thread+0x4c/0x68
<2>kernel BUG in spin_bug at /home/olaf/kernel/olh/gcc41/linux-2.6.14-olh/lib/spinlock_debug.c:34!
1:mon>
Comment 1 olh 2005-11-02 20:49:39 UTC
Created attachment 10115 [details]
PR24644.tar.bz2

buildscripts, preprocessed files, kernel .config and debug patches, bootlog.

building with -O1 does not fix it.
Comment 2 Mark Mitchell 2005-11-03 06:46:13 UTC
Until we have a testcase that actually shows some evidence that the toolchain is at fault, there is nothing for us to investigate; it could equally well be that the kernel has a bug.
Comment 3 olh 2005-11-03 08:24:28 UTC
copying the schedule.o from gcc41 tree to gcc40 tree doesnt help.
Comment 4 olh 2005-11-03 12:58:52 UTC
What I have found so far with even more debugging:

<4>Processor 1 found.
<7>schedule(2883) swapper(0):c1,j4294892318 c0000000142ae100
<6>Brought up 2 CPUs
<7>_spin_lock_irq(85) swapper(0):c1,j4294892318 l c0000000142ae100 c c0000000004db98c
<7>schedule(2996) swapper(0):c1,j4294892318 p c00000000ffd7040 n c00000000ffd2040 r c0000000142ae100
<7>schedule(3002) swapper(0):c1,j4294892318 p c00000000ffd7040 n c00000000ffd2040 r c0000000142ae100
<7>schedule(3029) swapper(1):c0,j4294892318 p c00000000ffd7040 n c00000000ffd7810 r c0000000142a6100 t c0000000142a6100
<7>finish_task_switch(1539) swapper(1):c0,j4294892318 r c0000000142a6100 p c00000000ffd7040
<7>finish_lock_switch(297) swapper(1):c0,j4294892318
<7>_spin_unlock_irq(292) migration/1(5):c1,j4294892318 l c0000000142a6100 c c0000000004dbd24
<4>BUG: spinlock already unlocked on CPU#1, migration/1/5

schedule() line 2883 grabs the rq at c0000000142ae100 on cpu1, and calls spinlock.
later in line 3029, it switches to cpu0 and rq becomes c0000000142a6100, 
this is after context_switch(), barrier(), finish_task_switch(), finish_lock_switch()
For some reason, the _spin_unlock_irq() call from finish_lock_switch() happens on cpu1?
But this should be no real problem, the real thing is, why changed rq 
from c0000000142ae100 to c0000000142a6100 between lines 3002 and 3029.
Comment 5 olh 2005-11-03 13:05:32 UTC
Created attachment 10123 [details]
PR24644-2.tar.bz2

rq does change between the context switch.

<7>schedule(3025) swapper(0):c1,j4294892318 p c00000000ffd7040 n c00000000ffd2040 r c0000000142ae100
<7>schedule(3027) swapper(1):c0,j4294892318 p c00000000ffd7040 n c00000000ffd7810 r c0000000142a6100
Comment 6 olh 2005-11-03 13:31:58 UTC
c0000000142ae100 is for cpu1, while c0000000142a6100 is for cpu0.
cpu1 was never active before, so all task switches were ok.
Comment 7 olh 2005-11-03 21:30:51 UTC
Created attachment 10135 [details]
PR24644-3.tar.bz2

revision 102096 works, revision 102098 doesnt.
copying the sched.o from the 102098 tree into the 102096 tree doesnt work either.

diff -purN gcc-mainline-music-r102096/ChangeLog gcc-mainline-music-r102098/ChangeLog
--- gcc-mainline-music-r102096/ChangeLog        2005-07-16 02:30:53.000000000 +0000
+++ gcc-mainline-music-r102098/ChangeLog        2005-07-16 18:56:53.000000000 +0000
@@ -1,3 +1,53 @@
+2005-07-16 Danny Berlin <dberlin@dberlin.org>
+          Kenneth Zadeck <zadeck@naturalbridge.com>
+
+       * Makefile.in: Added rules for ipa-pure-const.c, ipa-reference.c,
+       ipa-reference.h, ipa-utils.c, ipa-utils.h, ipa-type-escape.c,
+       ipa-type-escape.h, tree-promote-statics.c
...
Comment 8 Peter Bergner 2005-11-03 21:48:25 UTC
Adding myself to the CC list.
Comment 9 olh 2005-11-03 22:05:02 UTC
Created attachment 10136 [details]
PR24644-4.tar.bz2

object files and extracted asm functions for easier comparsion
Comment 10 Ian Lance Taylor 2005-11-04 05:48:27 UTC
From IRC, the problem seems to be that gcc is copying the value out of a global register variable, which it was not doing before.  This is not acceptable, as the global register may be changed on any context switch, and it is important to always use the current value each time it is dereferenced.

Declaring the global register variable as volatile may fix the problem, although one then gets an annoying unconditional warning every time you run the compiler ("volatile register variables don't work as you might wish").  The volatile qualifier is recorded at the tree level, but it is not recorded at the RTL level.  At the RTL level global registers are sometimes treated as volatile (e.g., hash_rtx in cse.c) and sometimes not (e.g., they are not noticed by volatile_refs_p, and thus are permitted in the input instructions in combine).

So some fixes that come to mind are, in order of increasing difficulty:
1) Change the source code to declare the variable volatile, and provide an option to disable the warning.
2) Change global registers to be volatile by default in the C/C++ frontends, and move the warning to there.
3) Change the source code to declare the variable volatile, and fix the RTL level to handle volatile register variables correctly.
Comment 11 Peter Bergner 2005-11-04 05:52:36 UTC
I've determined why we're dying, but not sure who is at fault yet.  While scanning through Olaf's assembly diff's, I noticed some code that normally wouldn't be a problem in user code, but is problamatic in PPC64 kernel code:

New code:
    mr r24, r13
    [snip]
    ld r9,360(r24)

Old code:
    [snip]
    ld r9, 360(r13)

The problem here is that r13 is a reserved register for the kernel (local_paca pointer) and can change values on a context switch if we migrate to another processor. so copying the value out of r13 is a no no.  Given what Olaf was seeing, this seems to be the case here.

The problamatic source line is from linux-2.6.X/include/asm-ppc64/paca.h:

    register struct paca_struct *local_paca asm ("r13");

Changing this to:

    register struct paca_struct * volatile local_paca asm ("r13");

gave this warning message:
    warning: volatile register variables don't work as you might wish

It's only a warning and looking at the disassembly, the code looks good.  Olaf, can you add the volatile above an try booting again?  Talking with Paulus, he isn't too trilled about the warning message which will occur for every file that uses current (lots!) so we're still not sure if there is a different kernel fix/tweek for this to whether gcc needs to change or both.

Comment 12 olh 2005-11-04 07:18:13 UTC
Yes, adding the volatile fixes it for me.
Comment 13 olh 2005-11-04 07:45:59 UTC
Why does gcc make copies of r13 anyway, when it could just use it where needed?
Comment 14 Ian Lance Taylor 2005-11-04 16:09:05 UTC
gcc is free to copy non-volatile variables.  It does this in many places and for many reasons, particularly in the new SSA optimizers in 4.0 and later.  I don't know the details of where it decided to copy the register in this test case, but presumably it seemed profitable to some heuristic.  I expect that it could happen if the usage is in a loop, or used on both branches of an if conditional, etc.
Comment 15 Peter Bergner 2005-11-04 17:38:01 UTC
For completeness, here is a minimal test case that shows the problem we are having:

bergner@vervain:~/olaf/PR24644-4> cat bar.c 
register struct paca_struct * local_paca asm("r13");
struct paca_struct { volatile long state; };
extern void foo(void);

int migration_thread(void)
{
 local_paca->state = 1;
 foo();
 local_paca->state = 0;
 return 0;
}
bergner@vervain:~/olaf/PR24644-4> ~/gcc/dataflow-base/bin/gcc -m64 -O1 -S bar.c 

The code we get looks like: 
.migration_thread:
        mflr 0
        std 29,-24(1)
        std 0,16(1)
        stdu 1,-144(1)
        mr 29,13
        li 0,1
        std 0,0(13)
        bl .foo
        nop
        li 0,0
        std 0,0(29)
        li 3,0
        addi 1,1,144
        ld 0,16(1)
        mtlr 0
        ld 29,-24(1)
        blr

This looks like gcc thinks r13 will be trashed by the call to foo() so it copies it to a non volatile register for use after the call.  Strange that this didn't happen before the patch found by Olaf.
Comment 16 Andrew Pinski 2005-11-04 18:00:08 UTC
(In reply to comment #15)
> This looks like gcc thinks r13 will be trashed by the call to foo() so it
> copies it to a non volatile register for use after the call.  Strange that this
> didn't happen before the patch found by Olaf.
Actually it is the opposite way.  GCC thinks that r13 will not trash (clobber) it, so it copies it.  I wonder why this happens as I think it should be marked as call clobber.
Comment 17 Andrew Pinski 2005-11-04 18:06:02 UTC
Variable: local_paca, UID 1270, struct paca_struct *, is global, call clobbered, default def: local_paca_1

But we get:
  #   TMT.6_8 = V_MAY_DEF <TMT.6_7>; 
  #   VUSE <local_paca_1>;
  foo ();

I don't understand why because to me r13 should be able to change in foo so we should get V_MAY_DEF there.
Comment 18 Andrew Pinski 2005-11-04 18:07:26 UTC
That testcase shows that we have a big problem,  as local_paca (aka r13) should be marked as call clobbered.
Comment 19 Andrew Pinski 2005-11-04 18:21:26 UTC
The problem is that ipa-static is looking at the variable because the front-end marked it as a static variable.  Daniel Berlin has sugggested a fix, and I will test it with the reduced test case and then if you could test it with the full kernel that would be nice.
Comment 20 Andrew Pinski 2005-11-04 18:33:33 UTC
Created attachment 10147 [details]
Patch to try

This patch should fix the issue in the reduced testcase but I don't know if it fixes it fully, Can one of you two test it with the kernel?
Comment 21 olh 2005-11-04 19:15:47 UTC
yes, this patch helps. tested with r106499.
Comment 22 Andrew Pinski 2005-11-04 21:21:32 UTC
Hmm, my patch is not fully correct but the C++ front-end gets this correct.
Comment 23 olh 2005-11-05 20:31:02 UTC
this patch works, tested with r106530
Comment 24 Andrew Pinski 2005-11-09 12:34:05 UTC
Fixed.
Comment 25 Andrew Pinski 2005-11-09 12:34:06 UTC
Subject: Bug 24644

Author: pinskia
Date: Wed Nov  9 12:33:59 2005
New Revision: 106693

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=106693
Log:
2005-11-09  Andrew Pinski  <pinskia@physics.uc.edu>

        PR c/24644
        * dwarf2-out.c (add_name_and_src_coords_attributes): Don't add
        a linkage name for a variable if it a register variable.
        * c-decl.c (grokdeclarator): Global register variables
        should be set as PUBLIC.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/c-decl.c
    trunk/gcc/dwarf2out.c

Comment 26 David Edelsohn 2005-11-11 17:59:59 UTC
Subject: Bug 24644

Author: dje
Date: Fri Nov 11 17:59:54 2005
New Revision: 106791

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=106791
Log:
        PR 24644
        * common.opt (Wvolatile-register-var): New.
        * varasm.c (make_decl_rtl): Only emit warning when option
        specified.  Clarify warning message.
        * doc/invoke.texi (Wvolatile-register-var): Document new option.

        * doc/md.texi (copysign): Document standard named pattern.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/common.opt
    trunk/gcc/doc/invoke.texi
    trunk/gcc/doc/md.texi
    trunk/gcc/varasm.c