While trying to boot linux 2.6.34 built with gcc from latest gcc-4_5-branch the kernel would not boot. Same kernel would work fine with gcc 4.4.4 Reducing to individual optimization phases. So far I have seen that when compiled with -O2 the kernel crashes however when I change the flags to -O2 -fno-ipa-sra then it works well. I don't have a small test-case as of now. I am trying to reduce it but it might take some time. Meanwhile I wanted to open the bug so that if someone else has information on similar issue can chime in.
Created attachment 20759 [details] preprocessed testcase OK so here is one file which whic is compiled with -O2 -fno-ipa-sra and rest of kernel with -O2 and it works ok. So something is going wrong in this file.
here is diff of two assembly outputs $ diff copypage-v4wb.s copypage-v4wb.no-ipa-sra.S -u --- copypage-v4wb.s 2010-05-27 00:11:03.130607878 -0700 +++ copypage-v4wb.no-ipa-sra.S 2010-05-27 00:10:54.790615578 -0700 @@ -120,19 +120,30 @@ v4wb_copy_user_highpage: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 - stmfd sp!, {r4, lr} - mov r1, sp - bic r4, r1, #8128 + stmfd sp!, {r4, r5, r6, lr} + mov ip, sp + bic r4, ip, #8128 bic r4, r4, #63 - ldr r1, [r4, #4] - add r1, r1, #1 - str r1, [r4, #4] - ldr r1, [r4, #4] - add r1, r1, #1 - str r1, [r4, #4] - ldr r1, [r3, #0] - ldr r1, [r1, #332] - tst r1, #1 + ldr ip, [r4, #4] + add ip, ip, #1 + str ip, [r4, #4] + ldr ip, .L8 + ldr lr, [r4, #4] + ldr r6, [ip, #0] + add lr, lr, #1 + rsb r6, r6, r0 + mov r6, r6, asr #5 + mov r6, r6, asl #12 + add r6, r6, #-1073741824 + str lr, [r4, #4] + ldr r5, [ip, #0] + ldr r0, [r3, #0] + rsb r5, r5, r1 + ldr r0, [r0, #332] + mov r5, r5, asr #5 + mov r5, r5, asl #12 + tst r0, #1 + add r5, r5, #-1073741824 beq .L6 bic r2, r2, #4080 bic r0, r2, #15 @@ -140,6 +151,8 @@ ldr r2, [r3, #20] bl arm926_flush_user_cache_range .L6: + mov r0, r6 + mov r1, r5 bl v4wb_copy_user_page ldr r3, [r4, #4] sub r3, r3, #1 @@ -147,7 +160,11 @@ ldr r3, [r4, #4] sub r3, r3, #1 str r3, [r4, #4] - ldmfd sp!, {r4, pc} + ldmfd sp!, {r4, r5, r6, pc} +.L9: + .align 2 +.L8: + .word mem_map .size v4wb_copy_user_highpage, .-v4wb_copy_user_highpage .global v4wb_user_fns .section .init.data,"aw",%progbits
Can you paste the output of gcc when adding -v to the commandline used for compiling copypage-v4wb.i in the failing case?
$ /home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/bin/arm-none-linux-uclibcgnueabi-gcc -O2 -fno-ipa-sra -S -v copypage-v4wb.i Using built-in specs. COLLECT_GCC=/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/bin/arm-none-linux-uclibcgnueabi-gcc COLLECT_LTO_WRAPPER=/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/libexec/gcc/arm-none-linux-uclibcgnueabi/4.5.1/lto-wrapper Target: arm-none-linux-uclibcgnueabi Configured with: /home/kraj/work/cross/arm-none-linux-uclibcgnueabi/../../gcc-4.5-20100520/configure --target=arm-none-linux-uclibcgnueabi --prefix=/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools --with-sysroot=/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/sysroot --enable-__cxa_atexit --disable-libssp --disable-libgomp --disable-libmudflap --enable-languages=c,c++ Thread model: posix gcc version 4.5.1 20100520 (prerelease) (GCC) COLLECT_GCC_OPTIONS='-O2' '-fno-ipa-sra' '-S' '-v' /home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/libexec/gcc/arm-none-linux-uclibcgnueabi/4.5.1/cc1 -fpreprocessed copypage-v4wb.i -quiet -dumpbase copypage-v4wb.i -auxbase copypage-v4wb -O2 -version -fno-ipa-sra -o copypage-v4wb.s GNU C (GCC) version 4.5.1 20100520 (prerelease) (arm-none-linux-uclibcgnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C (GCC) version 4.5.1 20100520 (prerelease) (arm-none-linux-uclibcgnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: acf86197000a593407366115824f5d00 COMPILER_PATH=/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/libexec/gcc/arm-none-linux-uclibcgnueabi/4.5.1/:/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/libexec/gcc/arm-none-linux-uclibcgnueabi/4.5.1/:/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/libexec/gcc/arm-none-linux-uclibcgnueabi/:/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/lib/gcc/arm-none-linux-uclibcgnueabi/4.5.1/:/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/lib/gcc/arm-none-linux-uclibcgnueabi/:/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/lib/gcc/arm-none-linux-uclibcgnueabi/4.5.1/../../../../arm-none-linux-uclibcgnueabi/bin/ LIBRARY_PATH=/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/lib/gcc/arm-none-linux-uclibcgnueabi/4.5.1/:/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/tools/lib/gcc/arm-none-linux-uclibcgnueabi/4.5.1/../../../../arm-none-linux-uclibcgnueabi/lib/:/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/sysroot/lib/:/home/kraj/work/cross/arm-none-linux-uclibcgnueabi/sysroot/usr/lib/ COLLECT_GCC_OPTIONS='-O2' '-fno-ipa-sra' '-S' '-v'
oops that was for good case. But just remove -fno-ipa-sra to make it a failing case :)
Confirmed. A linux-2.6.34 kernel configured for ARM and compiled with gcc-4.5-20100520 crashes during boot with a NULL pointer dereference in its copy_user_highpage() exactly at the point where it tries to start /sbin/init. HIGHMEM enabled or not makes no difference. The same kernel compiled with gcc-4.4.4 boots fine. Both gcc's were configured for armv5tel-unknown-linux-gnu --with-arch=armv5te --with-tune=xscale. The linux kernels were built for mach-iop32x/n2100 (XScale IOP80219). I note that copypage-xscale.c:xscale_mc_copy_user_highpage() calls a __naked function to do the bulk copy. Converting that to a plain inline function (changing 'pc' to 'lr' in the final instruction that restores the scrach regs), does not prevent the crash. So I suspect a plain C code miscompilation. I'll try to bisect it.
Bisection identified r148981 as the cause of this regression: Author: rth Date: Fri Jun 26 18:23:32 2009 New Revision: 148981 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=148981 Log: * function.h (struct function): Add cannot_be_copied_reason, and cannot_be_copied_set. * tree-inline.c (has_label_address_in_static_1): Rename from inline_forbidden_p_2; don't set inline_forbidden_reason here. (cannot_copy_type_1): Rename from inline_forbidden_p_op; likewise don't set inline_forbidden_reason. (copy_forbidden): New function, split out of inline_forbidden_p. (inline_forbidden_p_stmt): Don't check for nonlocal labels here. (inline_forbidden_p): Use copy_forbidden. (tree_versionable_function_p): Likewise. (inlinable_function_p): Merge into tree_inlinable_function_p. (tree_function_versioning): Remap cfun->nonlocal_goto_save_area. * ipa-cp.c (ipcp_versionable_function_p): New function. (ipcp_cloning_candidate_p): Use it. (ipcp_node_modifiable_p): Likewise. I'll try to extract a smaller test case tomorrow.
(In reply to comment #6) > I note that copypage-xscale.c:xscale_mc_copy_user_highpage() calls a __naked > function to do the bulk copy. Converting that to a plain inline function > (changing 'pc' to 'lr' in the final instruction that restores the scrach regs), > does not prevent the crash. So I suspect a plain C code miscompilation. Actually that conversion away from __naked may have been flawed. What I'm seeing is that r148981 causes gcc to clone the __naked function and change its calling conventions in ways that don't match the proper function call ABI. This breaks the body of the __naked function which is just a big asm() statement.
(In reply to comment #8) > (In reply to comment #6) > > I note that copypage-xscale.c:xscale_mc_copy_user_highpage() calls a __naked > > function to do the bulk copy. Converting that to a plain inline function > > (changing 'pc' to 'lr' in the final instruction that restores the scrach regs), > > does not prevent the crash. So I suspect a plain C code miscompilation. > > Actually that conversion away from __naked may have been flawed. What I'm > seeing is that r148981 causes gcc to clone the __naked function and change its > calling conventions in ways that don't match the proper function call ABI. > This breaks the body of the __naked function which is just a big asm() > statement. Well. The arm backend needs to mark the function as "used in non-visible ways" then. Thus this is a target bug.
Or rather, if you have void __attribute__((naked)) foo (int i) { asm("use i"); } without any inputs refering to i that is invalid. Like I see in the attached preprocessed source: static void __attribute__((naked)) __attribute__((no_instrument_function)) v4wb_copy_user_page(void *kto, const void *kfrom) { asm(" stmfd sp!, {r4, lr} @ 2\n mov r2, %0 @ 1\n ldmia r1!, {r3, r4, ip, lr} @ 4\n1: mcr p15, 0, r0, c7, c6, 1 @ 1 invalidate D line\n stmia r0!, {r3, r4, ip, lr} @ 4\n ldmia r1!, {r3, r4, ip, lr} @ 4+1\n stmia r0!, {r3, r4, ip, lr} @ 4\n ldmia r1!, {r3, r4, ip, lr} @ 4\n mcr p15, 0, r0, c7, c6, 1 @ 1 invalidate D line\n stmia r0!, {r3, r4, ip, lr} @ 4\n ldmia r1!, {r3, r4, ip, lr} @ 4\n subs r2, r2, #1 @ 1\n stmia r0!, {r3, r4, ip, lr} @ 4\n ldmneia r1!, {r3, r4, ip, lr} @ 4\n bne 1b @ 1\n mcr p15, 0, r1, c7, c10, 4 @ 1 drain WB\n ldmfd sp!, {r4, pc} @ 3" # 46 "/home/kraj/work/linux-2.6.34/arch/arm/mm/copypage-v4wb.c" : : "I" (((1UL) << 12) / 64)); } kto and kfrom are unused.
(it seems quite stupid to have naked functions with only an asm inside in the first place - you can equally well use plain assembly)
Subject: Re: [4.5 Regression] arm linux kernel crahes when built with -fipa-sra, __naked attribute is broken The naked attribute should cause two things noinline and noclone. Sent from my iPhone On May 29, 2010, at 4:50 AM, "rguenth at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org > wrote: > > > ------- Comment #11 from rguenth at gcc dot gnu dot org 2010-05-29 > 11:50 ------- > (it seems quite stupid to have naked functions with only an asm > inside in the > first place - you can equally well use plain assembly) > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44290 >
(In reply to comment #10) > Or rather, if you have > > void __attribute__((naked)) foo (int i) > { > asm("use i"); > } > > without any inputs refering to i that is invalid. Not according to gcc/doc/extend.texi: > @item naked > @cindex function without a prologue/epilogue code > Use this attribute on the ARM, AVR, IP2K, RX and SPU ports to indicate that > the specified function does not need prologue/epilogue sequences generated by > the compiler. It is up to the programmer to provide these sequences. The > only statements that can be safely included in naked functions are > @code{asm} statements that do not have operands. Note: "do not have operands". Thus the only way such an asm() can refer to parameters is by assuming a standard function call sequence and hardcoding corresponding register numbers or stack frame offsets. However, even if the asm() refers to those parameters via "r"(...) inputs, gcc-4.5 changes the register assignment to not agree with the standard call sequence, I'll attach a small test case showing that in a moment.
(In reply to comment #11) > (it seems quite stupid to have naked functions with only an asm inside in the > first place - you can equally well use plain assembly) Except that with plain asm() for an entire function definition you'd also have to include boring preamble/postamble stuff like .align/.type/.size if you want it to appear as a proper function, and you still have to declarate a prototype. And the reason for making it a separate function rather than an inline asm() is probably related to register assignment: a separate function can (could) make assumptions about parameter registers and scratch registers. With inline asm() you have to be much more elaborate, esp. if you have constraints that gcc cannot express, like even/odd register pairs on ARM.
(In reply to comment #13) > (In reply to comment #10) > > Or rather, if you have > > > > void __attribute__((naked)) foo (int i) > > { > > asm("use i"); > > } > > > > without any inputs refering to i that is invalid. > > Not according to gcc/doc/extend.texi: > > > @item naked > > @cindex function without a prologue/epilogue code > > Use this attribute on the ARM, AVR, IP2K, RX and SPU ports to indicate that > > the specified function does not need prologue/epilogue sequences generated by > > the compiler. It is up to the programmer to provide these sequences. The > > only statements that can be safely included in naked functions are > > @code{asm} statements that do not have operands. > > Note: "do not have operands". Thus the only way such an asm() can refer to > parameters is by assuming a standard function call sequence and hardcoding > corresponding register numbers or stack frame offsets. Then the target has to properly communicate this to the middle-end. > However, even if the asm() refers to those parameters via "r"(...) inputs, > gcc-4.5 changes the register assignment to not agree with the standard call > sequence, I'll attach a small test case showing that in a moment. I'd have required dummy inputs like "g" (kto), "g" (kfrom) not used by the actual assembly. For now re-open as a target bug.
And CC a arm maintainer. There might be more target-specific attributes that need adjustment.
Created attachment 20772 [details] test case from copypage-xscale.c This is distilled from the kernel's copypage-xscale.c file and illustrates the issue. With gcc-4.4 the __naked__ function foo() is called with the standard call sequence register assignment, so the asm() body of foo() works. With gcc-4.5 foo() is cloned and gets its second parameter `to' in r0 (not r1 as expected), and the body of foo() is modified to set up the actual first parameter (&fie[0]) in r1 (not r0 as expected). Obviously the asm() then breaks. Compiling with -fno-ipa-cp avoids this problem, as does adding __noclone__ and __noinline__ to foo()'s function definition. I don't immediately see how to enforce __noclone__ and __noinline__ in the ARM backend when it sees __naked__. Any ideas?
Created attachment 20773 [details] linux kernel workaround for attribute naked breakage This patch makes the Linux kernel add noinline and noclone attributes to functions declared __naked. This allows gcc-4.5 to build a working 2.6.34 Linux kernel for my mach-iop32x/n2100 ARM box. Khem: can you check if this kernel-side workaround fixes your problem? Eventually I'd like the kernel to not use __naked, but that is non-trivial. This fix should work now and be easily backportable to older kernels.
(In reply to comment #18) > Created an attachment (id=20773) [edit] > linux kernel workaround for attribute naked breakage > > This patch makes the Linux kernel add noinline and noclone attributes to > functions declared __naked. This allows gcc-4.5 to build a working 2.6.34 > Linux kernel for my mach-iop32x/n2100 ARM box. > > Khem: can you check if this kernel-side workaround fixes your problem? I have tried using __noclone__ a couple of days ago to workaround the problem //static void __attribute__((__naked__, __noinline__, __noclone__, __no_instrument_function__)) static void __attribute__((__naked__, __no_instrument_function__)) v4wb_copy_user_page(void *kto, const void *kfrom) But my gcc seems to ignore it and its generating same code for both with Os and O2 for both above cases. Hence did not solve my issue. I am using snapshot of 4.5 branch from May 20 > > Eventually I'd like the kernel to not use __naked, but that is non-trivial. > This fix should work now and be easily backportable to older kernels. >
Created attachment 20777 [details] includes fix for all arms I see the problem after reading Richard's comment #10, here is some more additions to kernel workaround patch which work for me too.
Subject: Bug 44290 Author: jiez Date: Fri Jul 23 14:47:46 2010 New Revision: 162466 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162466 Log: PR target/44290 * attribs.c (decl_attributes): Insert "noinline" and "noclone" if "naked". * tree-sra.c (ipa_sra_preliminary_function_checks): Return false if ! tree_versionable_function_p. testsuite/ PR target/44290 * gcc.dg/pr44290-1.c: New test. * gcc.dg/pr44290-2.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr44290-1.c trunk/gcc/testsuite/gcc.dg/pr44290-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/attribs.c trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-sra.c
Should be fixed on trunk now.
Set the status to FIXED.
If this is fixed, the target milestone should be 4.6.0 and not 4.5.1 . I thought this was a regression on the 4.5 branch and given that the branch is now locked down for 4.5.1 the target milestone ought to be 4.5.2 and this patch should also be backported to the 4.5 branch.
Ramana, I will ask for permission to commit it on 4.5 branch when it's unfrozen.
Subject: Bug 44290 Author: jiez Date: Tue Jul 27 17:33:30 2010 New Revision: 162579 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162579 Log: PR target/44290 Revert 2010-07-23 Jie Zhang <jie@codesourcery.com> * tree-sra.c (ipa_sra_preliminary_function_checks): Return false if ! tree_versionable_function_p. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-sra.c
GCC 4.5.1 is being released, adjusting target milestone.
I presume this is fixed on trunk, but still broken n 4.5 branch, is that correct?
Serge, yes. But GCC 4.5 branch is frozen now again.
GCC 4.5.2 is being released, adjusting target milestone.
Patch for 4.5 was posted: http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01351.html Waiting for approval.
Backporting regression fixes is generally fine and does not require explicit approval (given that the patches do not need significant changes).
GCC 4.5.3 is being released, adjusting target milestone.
Jie , Are you still planning on backporting this ? Ramana
Fixed in 4.6.0, the 4.5 branch is being closed.