Summary: | Optimization flag -O1 -fschedule-insns2 cause red zone to be used when there is none | ||
---|---|---|---|
Product: | gcc | Reporter: | chenkb <chenkb> |
Component: | target | Assignee: | Alan Modra <amodra> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | bergner, davem, fang, gcc-bugs, hp, matz, ralphs, rearnsha, reisinger, wilson |
Priority: | P3 | Keywords: | wrong-code |
Version: | 3.4.6 | ||
Target Milestone: | --- | ||
Host: | Target: | powerpc-linux-gnu powerpc-eabi | |
Build: | Known to work: | ||
Known to fail: | Last reconfirmed: | 2006-12-23 10:31:55 | |
Attachments: |
preprocessed source for arm-eabi testcase
Possible patch belt and braces fix for rs6000.c minimal rs6000.c fix |
Description
chenkb
2006-12-23 10:09:49 UTC
For 4.0 and above your sample code works but that is a different issue. Here is a testcase which fails for 4.0 and above: int find_num(int i) { int arr[5] = {0, 1, 2, 3, 4}; return arr[i]; } ---- The problem is the scheduler is moving the load past the update of r1 which is invalid for the SYSV ABI to do as there is no red zone. This has almost always been an issue since the rs6000 target was added and/or the scheduler was added. *** Bug 30736 has been marked as a duplicate of this bug. *** *** Bug 31898 has been marked as a duplicate of this bug. *** Created attachment 16992 [details]
preprocessed source for arm-eabi testcase
Comment on attachment 16992 [details] preprocessed source for arm-eabi testcase I've tripped over this bug on arm-eabi too, with this code, compiled as thumb. extern int doStreamReadBlock(int *, char *, int size, int); char readStream(int *s) { char c = 0; doStreamReadBlock(s, &c, 1, *s); return c; } arm-eabi davem$ arm-eabi-gcc -v -O2 -mthumb -c test.c Using built-in specs. Target: arm-eabi Configured with: ../../gcc-4.3.2/configure --enable-languages=c,c++ --with-cpu=arm7tdmi --enable-interwork --enable-multilib --with-gcc --with-gnu-ld --with-gnu-as --disable-shared --disable-threads --disable-win32-registry --disable-nls --disable-debug --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --target=arm-eabi --with-newlib --prefix=/opt/devkitpro/devkitARM --with-bugurl=http://wiki.devkitpro.org/index.php/Bug_Reports --with-pkgversion='devkitARM release 24' Thread model: single gcc version 4.3.2 (devkitARM release 24) 00000000 <readStream>: 0: b510 push {r4, lr} 2: b082 sub sp, #8 4: 466c mov r4, sp 6: 3407 adds r4, #7 8: 2300 movs r3, #0 a: 7023 strb r3, [r4, #0] c: 1c21 adds r1, r4, #0 e: 6803 ldr r3, [r0, #0] 10: 2201 movs r2, #1 12: f7ff fffe bl 0 <doStreamReadBlock> 16: b002 add sp, #8 18: 7820 ldrb r0, [r4, #0] 1a: bc10 pop {r4} 1c: bc02 pop {r1} 1e: 4708 bx r1 The same thing happens with mainline /opt/devkitpro/devkitARM_mainline/bin/arm-eabi-gcc -v -O2 -mthumb -c test.c Using built-in specs. Target: arm-eabi Configured with: ../../../gcc_mainline/configure --disable-nls --target=arm-eabi --prefix=/opt/devkitPro/devkitARM --enable-languages=c,c++ --with-cpu=arm7tdmi --enable-interwork --enable-multilib --with-gcc --with-gnu-ld --with-gnu-as --disable-shared --disable-threads --disable-win32-registry --disable-nls --disable-debug --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch Thread model: single gcc version 4.4.0 20081223 (experimental) (GCC) 00000000 <readStream>: 0: b510 push {r4, lr} 2: b082 sub sp, #8 4: 466c mov r4, sp 6: 3407 adds r4, #7 8: 2300 movs r3, #0 a: 7023 strb r3, [r4, #0] c: 1c21 adds r1, r4, #0 e: 6803 ldr r3, [r0, #0] 10: 2201 movs r2, #1 12: f7ff fffe bl 0 <doStreamReadBlock> 16: b002 add sp, #8 18: 7820 ldrb r0, [r4, #0] 1a: bc10 pop {r4} 1c: bc02 pop {r1} 1e: 4708 bx r1 (In reply to comment #5) > (From update of attachment 16992 [details] [edit]) > I've tripped over this bug on arm-eabi too, with this code, compiled as thumb. Except this bug is about ppc sysv ABI and not really about ARM-eabi, please open a new bug report for that. *** Bug 42953 has been marked as a duplicate of this bug. *** Created attachment 19803 [details]
Possible patch
I've been testing the attached patch on ARM (well, thumb) where there's a similar issue. It's perhaps a bit more aggressive than it needs to be, but it should solve the problem generically rather than requiring each back-end to implement hacks for what is really a MI issue.
Could someone test this on PPC?
Subject: Re: Optimization flag -O1 -fschedule-insns2 cause red zone to be used when there is none Sent from my iPhone On Feb 4, 2010, at 2:48 AM, "rearnsha at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org > wrote: > > > ------- Comment #8 from rearnsha at gcc dot gnu dot org 2010-02-04 > 10:48 ------- > Created an attachment (id=19803) > --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19803&action=view) > Possible patch > > I've been testing the attached patch on ARM (well, thumb) where > there's a > similar issue. It's perhaps a bit more aggressive than it needs to > be, but it > should solve the problem generically rather than requiring each back- > end to > implement hacks for what is really a MI issue. > > Could someone test this on PPC? Well powerpc64 it is valid to move across the stack pointer if the stack is less than a specific size so this can cause regressions there. And will also cause a performance regressions on x86_64 also for the same reason. > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30282 > Subject: Re: Optimization flag -O1 -fschedule-insns2
cause red zone to be used when there is none
On Thu, 2010-02-04 at 16:36 +0000, pinskia at gmail dot com wrote:
> Well powerpc64 it is valid to move across the stack pointer if the
> stack is less than a specific size so this can cause regressions
> there. And will also cause a performance regressions on x86_64 also
> for the same reason.
We could perhaps make it a target hook. For powerpc, the hook returns
true if this is the sysv ABI, and returns false if this is the AIX ABI.
That might also be useful for ARM, if you need it to be true for thumb
and false for the arm port.
I'm not sure at the moment if this is necessary though. The testcase in
this bug report is a little different than the one I have. First thing
I notice is that the powerpc port is not emitting the stack_tie insn
here because there is no frame pointer. But I think it is always
necessary for the sysv ABI. If I modify rs6000_emit_stack_reset to
check DEFAULT_ABI == ABI_V4 in the rs6000_emit_stack_tie check, then I
get correct code for this testcase. I still need to modify the
stack_tie pattern to use (MEM:BLK (scratch)) to get correct code for my
testcase. These are two fairly simple powerpc backend changes though.
I'll have to look at the arm/thumb port next to see what is going on
there.
Jim
Subject: Re: Optimization flag -O1 -fschedule-insns2 cause red zone to be used when there is none On Thu, 2010-02-04 at 10:48 +0000, rearnsha at gcc dot gnu dot org wrote: > I've been testing the attached patch on ARM (well, thumb) where there's a > similar issue. > Could someone test this on PPC? I tried it on PPC for the two testcases in PR 30282 and PR 42953 and it works for both of them. However, as Andrew Pinski mentioned, this is wrong for some targets, so we would have to make this a target hook to be usable. Meanwhile, I also looked at the thumb testcase, and it looks like a simple matter of emitting a stack_tie insn in thumb1_expand_epilogue before the SP sets are emitted. The ARM port currently only emits the stack_tie insn in the prologue. So we have two possible solutions here: 1) Add a new target hook to control whether sched makes stack pointer sets depend on all MEMs. 2) Emit stack_tie in epilogue always for Thumb and rs6000 ABI_V4, and modify rs6000 stack_tie to use (mem (scratch)) like ARM. I don't feel strongly either way, but considering that the prologue and epilogue code already contain a lot of target dependent magic, I don't see the need for adding more target hooks when all we need is a few small ARM and rs6000 port changes. The second change also makes the RTL self-descriptive. With the first change, we have to make sure that any optimization pass that might move instructions around knows that stack pointers sets are special, and conflict with all MEMs. Jim Yes, this could be fixed in the Thumb back-end by doing it the same way as the ARM back-end does. However, I still think that is papering over a subtle problem in the scheduler. This is an insidious problem that can affect a number of ports (since it quietly leads to hard-to-detect wrong-code problems) and I feel it's poor design in the compiler to leave it up to each port maintainer to find the problem (bugs like this are often not found by users during testing because it requires an asynchronous event to occur at exactly the right moment to expose them). I strongly believe the scheduler should have along the lines of the one I've proposed, and if there is a hook, then the default behaviour should be to block scheduling across a stack adjustment. Ports which are known to have a stack red-zone can then disable the effect and gain an advantage -- that has to be better than leaving subtle bugs in user's code. I seem to be getting this bug on arm thumb also USB_INT16U ReadLE16U ( volatile USB_INT08U *pmem ) { USB_INT16U val; USB_INT08U *bytes = (USB_INT08U *)&val; bytes[0] = pmem[0]; bytes[1] = pmem[1]; return val; } B580 push {r7, lr} B081 sub sp, #4 7802 ldrb r2, [r0, #0] AF00 add r7, sp, #0 1CBB adds r3, r7, #2 701A strb r2, [r3, #0] 46BD mov sp, r7 B001 add sp, #4 7842 ldrb r2, [r0, #1] 8818 ldrh r0, [r3, #0] BC80 pop {r7} BC02 pop {r1} 4708 bx r1 This happens on both 4.4.4 and 4.5.1 (In reply to comment #13) > I seem to be getting this bug on arm thumb also That is a different bug, see PR 38644. This bug records the PowerPC specific bug. Created attachment 24922 [details]
belt and braces fix for rs6000.c
This fix goes overboard to guard against possible future compiler changes. I use Jakub's frame_tie in rs6000_emit_stack_reset when frame_reg_rtx is not sp, as he found necessary (alias analysis decided fp and sp accesses couldn't alias?), just in case alias analysis further loses the plot. I also make the frame_tie mem constraint +m for sp as well as fp, to make it explicit that reads via sp or regs set from sp, shouldn't cross the stack adjust. This is definitely belts and braces stuff, but I don't believe restricts the scheduler any more than a minimal fix.
Created attachment 24923 [details]
minimal rs6000.c fix
This fix just ensures we have a stack tie at the ABI_V4 stack reset. Note that only for ABI_V4 do we currently have frame_reg_rtx != sp_reg_rtx in rs6000_emit_stack_reset.
Alan, what is the status of your patch in comment #16? Do you plan on submitting that or are you thinking we need a different fix on ppc? Author: amodra Date: Mon Nov 7 01:14:33 2011 New Revision: 181056 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181056 Log: PR target/30282 * config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit blockage for ABI_V4. Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.c Author: amodra Date: Mon Nov 7 01:15:08 2011 New Revision: 181057 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181057 Log: PR target/30282 * config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit blockage for ABI_V4. Modified: branches/gcc-4_6-branch/gcc/ChangeLog branches/gcc-4_6-branch/gcc/config/rs6000/rs6000.c Author: amodra Date: Mon Nov 7 01:15:35 2011 New Revision: 181058 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181058 Log: PR target/30282 * config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit blockage for ABI_V4. Modified: branches/gcc-4_5-branch/gcc/ChangeLog branches/gcc-4_5-branch/gcc/config/rs6000/rs6000.c Author: amodra Date: Mon Nov 7 01:16:01 2011 New Revision: 181059 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181059 Log: PR target/30282 * config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit blockage for ABI_V4. Modified: branches/gcc-4_4-branch/gcc/ChangeLog branches/gcc-4_4-branch/gcc/config/rs6000/rs6000.c Author: amodra Date: Mon Nov 7 01:39:08 2011 New Revision: 181060 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181060 Log: PR target/30282 * config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit blockage for ABI_V4. Modified: branches/ibm/gcc-4_6-branch/gcc/ChangeLog.ibm branches/ibm/gcc-4_6-branch/gcc/config/rs6000/rs6000.c Author: amodra Date: Mon Nov 7 01:39:22 2011 New Revision: 181061 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181061 Log: PR target/30282 * config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit blockage for ABI_V4. Modified: branches/ibm/gcc-4_5-branch/gcc/ChangeLog.ibm branches/ibm/gcc-4_5-branch/gcc/config/rs6000/rs6000.c Fixed Actually, the testcase in #1 still fails on gcc-4.4. Looks like gcc-4.4 (and earlier) need to use a bigger blockage. |