Bug 58854 - [4.8 regression] "sub sp, fp, #40" hoisted above frame accesses
Summary: [4.8 regression] "sub sp, fp, #40" hoisted above frame accesses
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.8.1
: P3 normal
Target Milestone: 4.8.3
Assignee: Ramana Radhakrishnan
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2013-10-23 17:40 UTC by bccheng
Modified: 2014-05-29 20:50 UTC (History)
4 users (show)

See Also:
Host:
Target: arm-linux-gnueabi
Build:
Known to work: 4.7.3
Known to fail: 4.8.0, 4.8.1, 4.9.0
Last reconfirmed: 2013-10-29 00:00:00


Attachments
stripped from kernel 3.4 fs/dcache.c (128.64 KB, text/plain)
2013-10-23 17:42 UTC, bccheng
Details
lightly tested patch. (315 bytes, patch)
2013-10-29 14:53 UTC, Ramana Radhakrishnan
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description bccheng 2013-10-23 17:40:35 UTC
With GCC 4.8.1 we encountered a regression in linux 3.4 kernel code on ARM devices. The problematic instruction sequence is below:

    sub sp, fp, #40      << fp-48 is above sp now
    ldr r3, [fp, #-60]
    ldr r0, [fp, #-56]
    str r6, [r2]
    ldr r2, [fp, #-48]   << clobbered by handler
    str r3, [r2]         << trying to load from 0xffffffff
    ldmfd   sp, {r4, r5, r6, r7, r8, r9, r10, fp, sp, pc}

A read-only pointer value is passed to the function and stored at [fp-48]. But later it is found that the value has been clobbered and become 0xffffffff. If I manually move the "sub sp, fp, #40" instruction right before the ldmfd instruction, the kernel becomes stable again.

I can reproduce the regression on 4.8.1 and 4.8.2, but not on 4.7. The compilation command is

arm-eabi-gcc -O2  -marm -fno-omit-frame-pointer -mapcs  -march=armv7-a  -mabi=aapcs-linux -S test.c

and GCC is configured as
Target: arm-eabi
Configured with: /tmp/AOSP-toolchain/build/../gcc/gcc-4.8/configure --prefix=/tmp/toolchain-build-eabi/prefix --target=arm-eabi --host=x86_64-linux-gnu --build=x86_64-linux-gnu --with-gnu-as --with-gnu-ld --enable-languages=c,c++ --with-gmp=/tmp/toolchain-build-eabi/temp-install --with-mpfr=/tmp/toolchain-build-eabi/temp-install --with-mpc=/tmp/toolchain-build-eabi/temp-install --with-cloog=/tmp/toolchain-build-eabi/temp-install --with-isl=/tmp/toolchain-build-eabi/temp-install --with-ppl=/tmp/toolchain-build-eabi/temp-install --disable-ppl-version-check --disable-cloog-version-check --disable-isl-version-check --enable-cloog-backend=isl --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --disable-libssp --enable-threads --disable-nls --disable-libmudflap --disable-libgomp --disable-libstdc__-v3 --disable-sjlj-exceptions --disable-shared --disable-tls --disable-libitm --with-float=soft --with-fpu=vfp --with-arch=armv5te --enable-target-optspace --with-abi=aapcs --enable-initfini-array --disable-nls --prefix=/tmp/toolchain-build-eabi/prefix --with-sysroot=/tmp/toolchain-build-eabi/prefix/sysroot --with-binutils-version=2.23 --with-mpfr-version=3.1.1 --with-mpc-version=1.0.1 --with-gmp-version=5.0.5 --with-gcc-version=4.8 --with-gdb-version=7.6 --with-gxx-include-dir=/tmp/toolchain-build-eabi/prefix/include/c++/4.8 --with-bugurl=http://source.android.com/source/report-bugs.html --disable-bootstrap --disable-libquadmath --enable-plugins --disable-libsanitizer --enable-gold --enable-graphite=yes --with-cloog-version=0.18.0 --with-isl-version=0.11.1 --enable-eh-frame-hdr-for-static --with-arch=armv5te --disable-gold --program-transform-name='s&^&arm-eabi-&'
Thread model: single
gcc version 4.8 (GCC)
Comment 1 bccheng 2013-10-23 17:42:00 UTC
Created attachment 31083 [details]
stripped from kernel 3.4 fs/dcache.c
Comment 2 Mikael Pettersson 2013-10-24 22:50:04 UTC
Started with r188742.  The code generation difference at that revision is:

@@ -115,16 +115,17 @@
        bne     .L6
 .L9:
        ldr     r3, [fp, #-56]
+       sub     sp, fp, #40
        ldr     r2, [fp, #-60]
+       mov     r0, sl
        str     r6, [r3, #0]
        ldr     r3, [fp, #-52]
        str     r2, [r3, #0]
-       b       .L8
+       ldmfd   sp, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
 .L2:
+       sub     sp, fp, #40
        mov     sl, #0
-.L8:
        mov     r0, sl
-       sub     sp, fp, #40
        ldmfd   sp, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}
 .L28:
        b       .L23

Hoisting "sub sp, fp, #40" causes several loads, via FP minus an offset, to access locations below SP, and thus be vulnerable to clobbers from asynchronous calls (signal handles in user-space, exception handlers in the kernel as is the case here).
Comment 3 Ramana Radhakrishnan 2013-10-29 13:40:45 UTC
Confirmed. The problem appears to show up with the use of mapcs on the command line. Mine.
Comment 4 Ramana Radhakrishnan 2013-10-29 13:46:07 UTC
(In reply to Ramana Radhakrishnan from comment #3)
> Confirmed. The problem appears to show up with the use of mapcs on the
> command line. Mine.

sched2 moves this ahead - smells familiar. 


Ramana
Comment 5 Ramana Radhakrishnan 2013-10-29 14:53:21 UTC
Created attachment 31105 [details]
lightly tested patch.

completely untested but appears to fix the problem - Ben, can you please try this patch and see if fixes your issues ? 


regards
Ramana
Comment 6 bccheng 2013-10-29 22:33:20 UTC
Patch appears to be working:

c012c7ec:       e51b3034        ldr     r3, [fp, #-52]  ; 0x34
c012c7f0:       e51b203c        ldr     r2, [fp, #-60]  ; 0x3c
c012c7f4:       e51b0038        ldr     r0, [fp, #-56]  ; 0x38
c012c7f8:       e5836000        str     r6, [r3]
c012c7fc:       e51b3030        ldr     r3, [fp, #-48]  ; 0x30
c012c800:       e5832000        str     r2, [r3]
c012c804:       e24bd028        sub     sp, fp, #40     ; 0x28
c012c808:       e89daff0        ldm     sp, {r4, r5, r6, r7, r8, r9, sl, fp, sp, pc}

Thanks!
Comment 7 Ramana Radhakrishnan 2013-10-30 10:54:06 UTC
Author: ramana
Date: Wed Oct 30 10:54:04 2013
New Revision: 204203

URL: http://gcc.gnu.org/viewcvs?rev=204203&root=gcc&view=rev
Log:
Fix PR target/58854

2013-10-30  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

	PR target/58854
	* config/arm/arm.c (arm_expand_epilogue_apcs_frame): Emit blockage.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.c
Comment 8 Ramana Radhakrishnan 2013-11-11 09:38:17 UTC
Author: ramana
Date: Mon Nov 11 09:38:14 2013
New Revision: 204665

URL: http://gcc.gnu.org/viewcvs?rev=204665&root=gcc&view=rev
Log:
Backport fix for PR target/58854

2013-11-11  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

       Backported from mainline
        2013-10-30  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

       PR target/58854
       * config/arm/arm.c (arm_expand_epilogue_apcs_frame): Emit blockage


Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/config/arm/arm.c
Comment 9 Ramana Radhakrishnan 2013-11-11 09:38:42 UTC
Now fixed.
Comment 10 minktee 2014-01-01 02:44:47 UTC
Comment on attachment 31105 [details]
lightly tested patch.

>diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>index 212a4bc..23dfc0e 100644
>--- a/gcc/config/arm/arm.c
>+++ b/gcc/config/arm/arm.c
>@@ -26547,6 +26547,7 @@ arm_expand_epilogue_apcs_frame (bool really_return)
>   num_regs = bit_count (saved_regs_mask);
>   if ((offsets->outgoing_args != (1 + num_regs)) || cfun->calls_alloca)
>     {
>+      emit_insn (gen_blockage ());
>       /* Unwind the stack to just below the saved registers.  */
>       emit_insn (gen_addsi3 (stack_pointer_rtx,
>                              hard_frame_pointer_rtx,
Comment 11 minktee 2014-01-01 02:57:55 UTC
Comment on attachment 31083 [details]
stripped from kernel 3.4 fs/dcache.c

Created attachment 31083 [details]
stripped from kernel 3.4 fs/dcache.c
2013-10-23 17:42 UTC, bccheng@android.com
Comment 12 minktee 2014-01-01 03:59:50 UTC
Comment on attachment 31105 [details]
lightly tested patch.

>diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>index 212a4bc..23dfc0e 100644
>--- a/gcc/config/arm/arm.c
>+++ b/gcc/config/arm/arm.c
>@@ -26547,6 +26547,7 @@ arm_expand_epilogue_apcs_frame (bool really_return)
>   num_regs = bit_count (saved_regs_mask);
>   if ((offsets->outgoing_args != (1 + num_regs)) || cfun->calls_alloca)
>     {
>+1    emit_insn (gen_blockage ());
>       /* Unwind the stack to just below the saved registers.  */
>       emit_insn (gen_addsi3 (stack_pointer_rtx,
>                              hard_frame_pointer_rtx,
Comment 13 Alexa 2014-02-03 01:40:25 UTC Comment hidden (spam)