| Summary: | [4.6 Regression] Optimization flag -O1 -fschedule-insns2 causes wrong code | ||
|---|---|---|---|
| Product: | gcc | Reporter: | Dave Murphy <davem> |
| Component: | rtl-optimization | Assignee: | Not yet assigned to anyone <unassigned> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | amonakov, cwang, davem, gcc-bugs, heavy, hp, jeffreyalaw, jiangning.liu, joel, joey.ye, lingyouzeng, mikpelinux, ralf.corsepius, ralphs, ramana.r, rearnsha, rguenth, sebastian.huber, siarhei.siamashka, sjames, terry.guo |
| Priority: | P2 | Keywords: | wrong-code |
| Version: | 4.3.2 | ||
| Target Milestone: | 4.6.0 | ||
| See Also: | https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107248 | ||
| Host: | Target: | arm-none-eabi | |
| Build: | Known to work: | 4.2.4, 4.7.0 | |
| Known to fail: | 4.3.2, 4.4.2, 4.5.0, 4.6.0 | Last reconfirmed: | 2010-08-12 10:08:35 |
| Attachments: |
preprocessed source for arm-eabi testcase
proposed 4.6 fix for PR38644 |
||
|
Description
Dave Murphy
2008-12-27 22:09:46 UTC
Created attachment 16993 [details]
preprocessed source for arm-eabi testcase
Confirmed, this is a nasty bug that might silently bite users after a long period of apparently correct operation. *** Bug 42155 has been marked as a duplicate of this bug. *** *** Bug 42452 has been marked as a duplicate of this bug. *** IMO this is a generic bug in the scheduler. The code in sched-deps.c should note that STACK_POINTER_RTX is being changed and insert a memory barrier that prevents migration of stack-related memory accesses across the change. Of course, determining what memory accesses are stack-related is quite hard, and it may be that all memory accesses in the same address space as the stack will need to be restricted. If this is a generic bug, why are all dups of this for ARM targets? (In reply to comment #6) > If this is a generic bug, why are all dups of this for ARM targets? > Just because it's only been reported against ARM doesn't mean it's not a generic problem. No, it likely means other backends insert memory barriers where needed themselves. I've looked at several backends and certainly not all do (sparc for example). I think they get away with it because the stack pointer is valid in all addressing constructs -- that's not true for Thumb where SP can only be used for 32-bit loads. However, that doesn't mean there isn't an underlying bug. Re. #7 I'm not implying there is no generic bug. I just noticed the pattern (all reports for ARM) and thought that maybe the solution can be found in at least one of the other back ends. My friend also found it with powerpc. (In reply to comment #11) > My friend also found it with powerpc. Well some (most?) PPC ABIs there is a red zone which allows this to valid. *** Bug 44091 has been marked as a duplicate of this bug. *** This bug is present since r130052 which is related to: http://gcc.gnu.org/ml/gcc-patches/2007-10/msg01814.html If this is a generic bug or not is quite irrelevant since this is a very serious bug on ARM/Thumb. I am a bit irritated why this bug survived the 4.4.0 and 4.5.0 release. In a multi threaded environment it is pretty hard to find these kind of problems. The time frame in which an interrupt can corrupt the stack frame lies between two instructions. Is -fno-schedule-insns2 a workaround for this problem? Created attachment 20749 [details] proposed 4.6 fix for PR38644 PR38644 and its dupes are very similar to PowerPC PR44199 and PR30282. PR44199 was recently fixed by conditionally emitting stack ties in epilogues. I first intended to simply clone that approach for Thumb1, but it turns out there's already a conditional barrier in thumb1_expand_epilogue (). So for now I simply extended that condition to also trigger whenever there's stack pointer adjustment in the epilogue. I've confirmed that this fixes the test cases in PR38644, PR42155, and PR44091. I know that Richard Earnshaw has stated that he considers this a middle-end rather than a back-end bug, and I agree with that. However, given that this wrong-code bug has been known for so long with no middle-end fix in sight, I think solving it in the back-end is appropriate for now, at least for 4.4/4.5. The current patch is a little too heavy in that it also blocks non memory accesses from being scheduled past the stack pointer adjustment -- I saw an example of that in the large PR44091 test case. Using a stack tie instead of a full barrier should hopefully fix that. So far only tested with 4.4/4.5/4.6 crosses to armv5tel-unknown-linux-gnueabi. I'll try this in a 4.5 native bootstrap soonish (4.6 bootstraps are currently broken on ARM, see PR44255). Patch posted: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00481.html I tried to use the existing stack tie instead of a full barrier, but it had no effect at all, causing the mis-schedules to reappear. I also tried to port over the PowerPC version of the stack tie, but that ICEd the compiler; I'm not yet good enough at .md hackery to resolve that one. So I went back to the initial patch, and bootstrapped and regtested it in native builds of 4.6, 4.5, and 4.4 on armv5tel-linux-gnueabi. Thank you for your investigations. A special case fix is better than nothing. I am not a GCC expert but it seems that this kind of bug appears quite regularly on different platforms and all use special case code to avoid the evil consequences. If it is a middle-end bug it should draw more attention by the middle-end developers. You cannot detect this bug through simple test cases like a flow control or math problem. With normal unit tests you will likely not find this bug in your application. You need at least two threads of execution and you have to do certain things in between a wee bit of machine instructions. This bug is still present in GCC 4.6.0 20100807 (arm-eabi-gcc -O1 -fschedule-insns2 -mthumb):
readStream:
push {r4, lr}
sub sp, sp, #8
mov r4, sp
mov r3, #0
strb r3, [r4, #7]
add r4, r4, #7
ldr r3, [r0]
mov r1, r4
mov r2, #1
bl doStreamReadBlock
add sp, sp, #8
ldrb r0, [r4]
@ sp needed for prologue
pop {r4}
pop {r1}
bx r1
.size readStream, .-readStream
.ident "GCC: (GNU) 4.6.0 20100807 (experimental)"
According to comment#14, a patch from Alexander Monakov introduced this bug, therefore: 1. this is a regression on a primary platform => priority should be set P1 ...and 2. Add richi and amonakov to CC: Re. comment #14 "I am a bit irritated why this bug survived the 4.4.0 and 4.5.0 release.": Yes, well, ARM maintainers have been in the CC-list for this bug since the beginning, and apparently it was even too much trouble for them to see if this is a regression or not... :-( Anyway, many thanks to Sebastian Huber for identifying the revision that introduced (or exposed) this bug. It looks like patch from comment #16 should fix the problem, but was not reviewed and/or applied. The patch from comment #16 only fixes the symptom, and only on ARM. It is not a proper fix for the generic problem that is apparently also visible on POWER. It is not visible on POWER, because it has been fixed there. (In reply to comment #23) > The patch from comment #16 only fixes the symptom, and only on ARM. It is not a > proper fix for the generic problem that is apparently also visible on POWER. PR30282 audit trail contains more discussion of this problem. Jim Wilson argues that this problem should be addressed by emitting stack ties in epilogues for targets that suffer from this problem (other targets apparently do not thanks to red zone). POWER was fixed that way (PR44199). (In reply to comment #19) > According to comment#14, a patch from Alexander Monakov introduced this bug, > therefore: > > 1. this is a regression on a primary platform => priority should be set P1 It's not P1 because P1 is reserved for serious bugs that were never in any release which isn't true here. P1 _block_ a release, it is unreasonable to do so in general if a previous release shipped with that bug. (In reply to comment #21) > Re. comment #14 "I am a bit irritated why this bug survived the 4.4.0 > and 4.5.0 release.": Yes, well, ARM maintainers have been in the CC-list for > this bug since the beginning, and apparently it was even too much trouble for > them to see if this is a regression or not... :-( > > Anyway, many thanks to Sebastian Huber for identifying the revision that > introduced (or exposed) this bug. > So this ARM maintainer, proposed a fix for the problem (a generic bug, not a back-end bug). But because it seems that generating correct code on all targets isn't a priority, it was rejected. The compiler shouldn't be generating unsafe code by default; back-ends shouldn't need to paper over bugs in the MI code. The problem is that stuff like red-zone presence and size isn't known to the middle-end, all that stuff is backend private, so I think the right way is to handle this in the backends and most of the backends managed to handle it. (In reply to comment #28) > The problem is that stuff like red-zone presence and size isn't known to the > middle-end, all that stuff is backend private, so I think the right way is to > handle this in the backends and most of the backends managed to handle it. > No, the middle end code must fail safe. If targets don't need that, then they should have the ability to turn it off; not the other way around. This is critical because it leads to silent failures otherwise. A regression but no known-to-work version? (In reply to comment #30) > A regression but no known-to-work version? 4.2.4 is known to work. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44091#c5 Which target milestone do you intend for a fix? It is still present in 4.6.0 20100925. Any chance this can get some attention before 4.6 branches? Please. Typically the right thing to do is to block all memory motions across a change in the stack pointer. It's somewhat overly pessimistic, but in reality the few motions lost aren't going to be performance critical. In the past each backend has emitted the blockage/barrier and it typically happened soon after the port was converted to use RTL prologues/epilogues... That's probably the main reason why this was never fixed in the scheduler itself -- the first couple ports emitted a blockage and after that it became normal practice. I would support both emitting the suitable blockage insn in the ARM backend or adding a dependency between the stack pointer adjustment insn and all memory insns in the scheduler. Either is IMHO acceptable given history. The first would be slightly preferred during this late stage of development with the latter being more appropriate in early stage development. I verified that two patches in #38644 (back end) and #30282 (middle end) both work for attached cases. Here comes my two cents, 1) The concept of red zone is to control whether instructions can write memory below current stack frame or not, and it is only being supported by ABIs for some particular ISAs, so it shouldn't be enabled in middle end by default for all targets. At this point, middle end should be fixed to avoid doing things unwanted in general for all targets. 2) Red zone is an orthogonal concept to prologue/epilogue, so it is not good to fix this issue in prologue/epilogue back end code. At this point, we shouldn't simply fix it in back end by adding barrier to implicitly disable red zone. Instead, some hooks should be abstracted in middle end to support it in scheduling dependence (middle end code). Back end like X86-64 should enable it through hooks by itself. The key here is red zone should be a clean feature to be supported in middle end. Exposing this kind of stuff to back end through hooks can improve code quality for middle end and avoid bringing the bugs to back-end. This bug has long history, and it is being now or has ever been exposed on ARM, POWER and X86(with some options combination). Fixing it in middle end is not only a bug fix, but a simple infrastructure improvement. Due to the long duration and the extensive impact for different targets, I don't see good reason of not fixing it in mainline ASAP. 4.3 branch is being closed, moving to 4.4.7 target. Is this problem also related to this bug (GCC 4.6.1 20110627) with comments inside:
objdump -d -C /opt/rtems-4.11/lib/gcc/arm-rtems4.11/4.6.1/thumb/vfp/libstdc++.a
| grep -A 94 new_opnt.o
new_opnt.o: file format elf32-littlearm
Disassembly of section .text._ZnwmRKSt9nothrow_t:
00000000 <operator new(unsigned long, std::nothrow_t const&)>:
0: b5f0 push {r4, r5, r6, r7, lr}
2: 465f mov r7, fp
4: 4656 mov r6, sl
6: 464d mov r5, r9
8: 4644 mov r4, r8
a: b4f0 push {r4, r5, r6, r7}
c: b090 sub sp, #64 ; 0x40
e: 4b2a ldr r3, [pc, #168] ; (b8 <operator new(unsigned
long, std::nothrow_t const&)+0xb8>)
10: af00 add r7, sp, #0
12: 627b str r3, [r7, #36] ; 0x24
14: 4b29 ldr r3, [pc, #164] ; (bc <operator new(unsigned
long, std::nothrow_t const&)+0xbc>)
16: 62bb str r3, [r7, #40] ; 0x28
18: 4b29 ldr r3, [pc, #164] ; (c0 <operator new(unsigned
long, std::nothrow_t const&)+0xc0>)
1a: 6078 str r0, [r7, #4]
1c: 2240 movs r2, #64 ; 0x40
1e: 1c38 adds r0, r7, #0
20: 19d2 adds r2, r2, r7
22: 633b str r3, [r7, #48] ; 0x30
24: 300c adds r0, #12
26: 466b mov r3, sp
28: 62fa str r2, [r7, #44] ; 0x2c
2a: 637b str r3, [r7, #52] ; 0x34
2c: f7ff fffe bl 0 <_Unwind_SjLj_Register>
30: 687a ldr r2, [r7, #4]
32: 2a00 cmp r2, #0
34: d101 bne.n 3a <operator new(unsigned long, std::nothrow_t
const&)+0x3a>
36: 2301 movs r3, #1
38: 607b str r3, [r7, #4]
3a: 6878 ldr r0, [r7, #4]
3c: f7ff fffe bl 0 <malloc>
40: 6038 str r0, [r7, #0]
42: 2800 cmp r0, #0
44: d123 bne.n 8e <operator new(unsigned long, std::nothrow_t
const&)+0x8e>
46: 4a1f ldr r2, [pc, #124] ; (c4 <operator new(unsigned
long, std::nothrow_t const&)+0xc4>)
48: 6813 ldr r3, [r2, #0]
4a: 2b00 cmp r3, #0
4c: d104 bne.n 58 <operator new(unsigned long, std::nothrow_t
const&)+0x58>
4e: e021 b.n 94 <operator new(unsigned long, std::nothrow_t
const&)+0x94>
50: 4a1c ldr r2, [pc, #112] ; (c4 <operator new(unsigned
long, std::nothrow_t const&)+0xc4>)
52: 6813 ldr r3, [r2, #0]
54: 2b00 cmp r3, #0
56: d009 beq.n 6c <operator new(unsigned long, std::nothrow_t
const&)+0x6c>
58: 2201 movs r2, #1
5a: 613a str r2, [r7, #16]
5c: f000 f834 bl c8 <operator new(unsigned long, std::nothrow_t
const&)+0xc8>
60: 6878 ldr r0, [r7, #4]
62: f7ff fffe bl 0 <malloc>
66: 60b8 str r0, [r7, #8]
68: 2800 cmp r0, #0
6a: d0f1 beq.n 50 <operator new(unsigned long, std::nothrow_t
const&)+0x50>
6c: 1c38 adds r0, r7, #0
6e: 300c adds r0, #12
70: f7ff fffe bl 0 <_Unwind_SjLj_Unregister>
BAD CODE BEGIN
74: 46bd mov sp, r7
r7 is now the current stack pointer.
76: b010 add sp, #64 ; 0x40
Current stack frame is free now, r7 points to obsolete stack frame.
78: 68b8 ldr r0, [r7, #8]
Here we read from the stack frame freed previously. This is a disaster in
multi-threaded environments, because the exception code will use the stack of
an interrupted thread.
BAD CODE END
7a: bc3c pop {r2, r3, r4, r5}
7c: 4690 mov r8, r2
7e: 4699 mov r9, r3
80: 46a2 mov sl, r4
82: 46ab mov fp, r5
84: bdf0 pop {r4, r5, r6, r7, pc}
86: f7ff fffe bl 0 <__cxa_begin_catch>
8a: f7ff fffe bl 0 <__cxa_end_catch>
8e: 683b ldr r3, [r7, #0]
90: 60bb str r3, [r7, #8]
92: e7eb b.n 6c <operator new(unsigned long, std::nothrow_t
const&)+0x6c>
94: 2200 movs r2, #0
96: 60ba str r2, [r7, #8]
98: e7e8 b.n 6c <operator new(unsigned long, std::nothrow_t
const&)+0x6c>
9a: 3f40 subs r7, #64 ; 0x40
9c: 69bb ldr r3, [r7, #24]
9e: 6978 ldr r0, [r7, #20]
a0: 2b01 cmp r3, #1
a2: d0f0 beq.n 86 <operator new(unsigned long, std::nothrow_t
const&)+0x86>
a4: 1c5a adds r2, r3, #1
a6: d004 beq.n b2 <operator new(unsigned long, std::nothrow_t
const&)+0xb2>
a8: 2301 movs r3, #1
aa: 425b negs r3, r3
ac: 613b str r3, [r7, #16]
ae: f7ff fffe bl 0 <_Unwind_SjLj_Resume>
b2: 613b str r3, [r7, #16]
b4: f7ff fffe bl 0 <__cxa_call_unexpected>
...
c0: 0000009a .word 0x0000009a
c4: 00000000 .word 0x00000000
c8: 4718 bx r3
ca: 46c0 nop ; (mov r8, r8)
Hi Sebastian, I'm fixing this bug, and will send out a formal patch to community soon. Is it possible you send me your source code exposing the bug first, so I can verify your problem can really be fixed with my local fix? Appreciate your help in advance! BTW, you may refer to http://gcc.gnu.org/ml/gcc/2011-08/msg00000.html for discussion details. -Jiangning > -----Original Message----- > From: sebastian.huber@embedded-brains.de [mailto:gcc- > bugzilla@gcc.gnu.org] > Sent: Thursday, August 04, 2011 8:32 PM > To: Jiangning Liu > Subject: [Bug rtl-optimization/38644] [4.4/4.5/4.6/4.7 Regression] > Optimization flag -O1 -fschedule-insns2 causes wrong code > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644 > > --- Comment #37 from Sebastian Huber <sebastian.huber@embedded- > brains.de> 2011-08-04 12:30:29 UTC --- > Is this problem also related to this bug (GCC 4.6.1 20110627) with > comments > inside: > > objdump -d -C /opt/rtems-4.11/lib/gcc/arm- > rtems4.11/4.6.1/thumb/vfp/libstdc++.a > | grep -A 94 new_opnt.o > > new_opnt.o: file format elf32-littlearm > > > Disassembly of section .text._ZnwmRKSt9nothrow_t: > > 00000000 <operator new(unsigned long, std::nothrow_t const&)>: > 0: b5f0 push {r4, r5, r6, r7, lr} > 2: 465f mov r7, fp > 4: 4656 mov r6, sl > 6: 464d mov r5, r9 > 8: 4644 mov r4, r8 > a: b4f0 push {r4, r5, r6, r7} > c: b090 sub sp, #64 ; 0x40 > e: 4b2a ldr r3, [pc, #168] ; (b8 <operator > new(unsigned > long, std::nothrow_t const&)+0xb8>) > 10: af00 add r7, sp, #0 > 12: 627b str r3, [r7, #36] ; 0x24 > 14: 4b29 ldr r3, [pc, #164] ; (bc <operator > new(unsigned > long, std::nothrow_t const&)+0xbc>) > 16: 62bb str r3, [r7, #40] ; 0x28 > 18: 4b29 ldr r3, [pc, #164] ; (c0 <operator > new(unsigned > long, std::nothrow_t const&)+0xc0>) > 1a: 6078 str r0, [r7, #4] > 1c: 2240 movs r2, #64 ; 0x40 > 1e: 1c38 adds r0, r7, #0 > 20: 19d2 adds r2, r2, r7 > 22: 633b str r3, [r7, #48] ; 0x30 > 24: 300c adds r0, #12 > 26: 466b mov r3, sp > 28: 62fa str r2, [r7, #44] ; 0x2c > 2a: 637b str r3, [r7, #52] ; 0x34 > 2c: f7ff fffe bl 0 <_Unwind_SjLj_Register> > 30: 687a ldr r2, [r7, #4] > 32: 2a00 cmp r2, #0 > 34: d101 bne.n 3a <operator new(unsigned long, > std::nothrow_t > const&)+0x3a> > 36: 2301 movs r3, #1 > 38: 607b str r3, [r7, #4] > 3a: 6878 ldr r0, [r7, #4] > 3c: f7ff fffe bl 0 <malloc> > 40: 6038 str r0, [r7, #0] > 42: 2800 cmp r0, #0 > 44: d123 bne.n 8e <operator new(unsigned long, > std::nothrow_t > const&)+0x8e> > 46: 4a1f ldr r2, [pc, #124] ; (c4 <operator > new(unsigned > long, std::nothrow_t const&)+0xc4>) > 48: 6813 ldr r3, [r2, #0] > 4a: 2b00 cmp r3, #0 > 4c: d104 bne.n 58 <operator new(unsigned long, > std::nothrow_t > const&)+0x58> > 4e: e021 b.n 94 <operator new(unsigned long, > std::nothrow_t > const&)+0x94> > 50: 4a1c ldr r2, [pc, #112] ; (c4 <operator > new(unsigned > long, std::nothrow_t const&)+0xc4>) > 52: 6813 ldr r3, [r2, #0] > 54: 2b00 cmp r3, #0 > 56: d009 beq.n 6c <operator new(unsigned long, > std::nothrow_t > const&)+0x6c> > 58: 2201 movs r2, #1 > 5a: 613a str r2, [r7, #16] > 5c: f000 f834 bl c8 <operator new(unsigned long, > std::nothrow_t > const&)+0xc8> > 60: 6878 ldr r0, [r7, #4] > 62: f7ff fffe bl 0 <malloc> > 66: 60b8 str r0, [r7, #8] > 68: 2800 cmp r0, #0 > 6a: d0f1 beq.n 50 <operator new(unsigned long, > std::nothrow_t > const&)+0x50> > 6c: 1c38 adds r0, r7, #0 > 6e: 300c adds r0, #12 > 70: f7ff fffe bl 0 <_Unwind_SjLj_Unregister> > > BAD CODE BEGIN > > 74: 46bd mov sp, r7 > > r7 is now the current stack pointer. > > 76: b010 add sp, #64 ; 0x40 > > Current stack frame is free now, r7 points to obsolete stack frame. > > 78: 68b8 ldr r0, [r7, #8] > > Here we read from the stack frame freed previously. This is a disaster > in > multi-threaded environments, because the exception code will use the > stack of > an interrupted thread. > > BAD CODE END > > 7a: bc3c pop {r2, r3, r4, r5} > 7c: 4690 mov r8, r2 > 7e: 4699 mov r9, r3 > 80: 46a2 mov sl, r4 > 82: 46ab mov fp, r5 > 84: bdf0 pop {r4, r5, r6, r7, pc} > 86: f7ff fffe bl 0 <__cxa_begin_catch> > 8a: f7ff fffe bl 0 <__cxa_end_catch> > 8e: 683b ldr r3, [r7, #0] > 90: 60bb str r3, [r7, #8] > 92: e7eb b.n 6c <operator new(unsigned long, > std::nothrow_t > const&)+0x6c> > 94: 2200 movs r2, #0 > 96: 60ba str r2, [r7, #8] > 98: e7e8 b.n 6c <operator new(unsigned long, > std::nothrow_t > const&)+0x6c> > 9a: 3f40 subs r7, #64 ; 0x40 > 9c: 69bb ldr r3, [r7, #24] > 9e: 6978 ldr r0, [r7, #20] > a0: 2b01 cmp r3, #1 > a2: d0f0 beq.n 86 <operator new(unsigned long, > std::nothrow_t > const&)+0x86> > a4: 1c5a adds r2, r3, #1 > a6: d004 beq.n b2 <operator new(unsigned long, > std::nothrow_t > const&)+0xb2> > a8: 2301 movs r3, #1 > aa: 425b negs r3, r3 > ac: 613b str r3, [r7, #16] > ae: f7ff fffe bl 0 <_Unwind_SjLj_Resume> > b2: 613b str r3, [r7, #16] > b4: f7ff fffe bl 0 <__cxa_call_unexpected> > ... > c0: 0000009a .word 0x0000009a > c4: 00000000 .word 0x00000000 > c8: 4718 bx r3 > ca: 46c0 nop ; (mov r8, r8) > > -- > Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. On Fri, Aug 5, 2011 at 2:33 AM, jiangning.liu at arm dot com <gcc-bugzilla@gcc.gnu.org> wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644 > > --- Comment #38 from Jiangning Liu <jiangning.liu at arm dot com> 2011-08-05 01:33:06 UTC --- > Hi Sebastian, > > I'm fixing this bug, and will send out a formal patch to community soon. > > Is it possible you send me your source code exposing the bug first, so I can > verify your problem can really be fixed with my local fix? I think this is just from the libstdc++ sources - look at that file from the build with --with-mode=thumb . Ramana (In reply to comment #39) > On Fri, Aug 5, 2011 at 2:33 AM, jiangning.liu at arm dot com [...] > > Is it possible you send me your source code exposing the bug first, so I can > > verify your problem can really be fixed with my local fix? > > I think this is just from the libstdc++ sources - look at that file > from the build with --with-mode=thumb . Yes, this is from the libstdc++ sources (4.6.1 20110627, libstdc++-v3/libsupc++/new_opnt.cc). You need a non-EABI ARM variant of GCC since this bug manifestation will only show up in the SJLJ version.
> Yes, this is from the libstdc++ sources (4.6.1 20110627,
> libstdc++-v3/libsupc++/new_opnt.cc). You need a non-EABI ARM variant of GCC
> since this bug manifestation will only show up in the SJLJ version.
I tried and my local patch works on this case. As you can see like below, it is fixed!
add r0, r0, #12
bl _Unwind_SjLj_Unregister
ldr r0, [r7, #8]
mov sp, r7
add sp, sp, #68
@ sp needed for prologue
pop {r2, r3, r4, r5}
Thanks,
-Jiangning
*** Bug 50081 has been marked as a duplicate of this bug. *** How long will this middle to back end ping pong last until this bug is finally fixed? It is now open since 2008. Ping.. any chance of getting the proposed 4.6 fix merged? Please. This is almost a 3 year old bug. It would be nice to get it closed. (In reply to comment #41) > I tried and my local patch works on this case. As you can see like below, it is > fixed! So, where is this local patch? And why not attach it here or post it to gcc-patches, instead of keeping it local? ;) I guess the local patch will look like this: http://gcc.gnu.org/ml/gcc/2011-07/msg00459.html On 09/12/2011 02:42 AM, sebastian.huber@embedded-brains.de wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644 > > --- Comment #46 from Sebastian Huber<sebastian.huber@embedded-brains.de> 2011-09-12 08:42:59 UTC --- > I guess the local patch will look like this: > > http://gcc.gnu.org/ml/gcc/2011-07/msg00459.html A much simpler way to fix this is to emit a barrier just prior to mucking around with stack pointer in the epilogue. That's how targets have dealt with this exact issue for a couple decades. JEff On 12/09/11 16:18, law at redhat dot com wrote:
> A much simpler way to fix this is to emit a barrier just prior to
> mucking around with stack pointer in the epilogue. That's how targets
> have dealt with this exact issue for a couple decades.
Simpler, but wrong. The compiler should not be generating unsafe code
by default. The problem is in the mid-end and expecting every port to
get this right in order to work-around a mid-end bug is just stupid
stupid stupid.
The mid end should not be scheduling around stack moves unless it has
been explicitly told it is safe to do this. I don't understand why
there is so much resistance to fixing the problem properly.
On 09/12/2011 09:31 AM, rearnsha at arm dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644 > > --- Comment #48 from Richard Earnshaw<rearnsha at arm dot com> 2011-09-12 15:31:51 UTC --- > On 12/09/11 16:18, law at redhat dot com wrote: > >> A much simpler way to fix this is to emit a barrier just prior to >> mucking around with stack pointer in the epilogue. That's how targets >> have dealt with this exact issue for a couple decades. > Simpler, but wrong. The compiler should not be generating unsafe code > by default. The problem is in the mid-end and expecting every port to > get this right in order to work-around a mid-end bug is just stupid > stupid stupid. > > The mid end should not be scheduling around stack moves unless it has > been explicitly told it is safe to do this. I don't understand why > there is so much resistance to fixing the problem properly. I don't disagree with you Richard, but we're at, what, 3 years on this bug... That's absurd, particularly when there's a trivial, correct fix. jeff Perhaps someone can comment on, and test, Joern's patch here: http://gcc.gnu.org/ml/gcc/2011-07/msg00461.html @Richard E: That would be a middle-end fix, if it's correct, so perhaps you could do the honors? On Mon, 12 Sep 2011, rearnsha at arm dot com wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644
>
> --- Comment #48 from Richard Earnshaw <rearnsha at arm dot com> 2011-09-12 15:31:51 UTC ---
> On 12/09/11 16:18, law at redhat dot com wrote:
>
> > A much simpler way to fix this is to emit a barrier just prior to
> > mucking around with stack pointer in the epilogue. That's how targets
> > have dealt with this exact issue for a couple decades.
>
> Simpler, but wrong. The compiler should not be generating unsafe code
> by default. The problem is in the mid-end and expecting every port to
> get this right in order to work-around a mid-end bug is just stupid
> stupid stupid.
>
> The mid end should not be scheduling around stack moves unless it has
> been explicitly told it is safe to do this. I don't understand why
> there is so much resistance to fixing the problem properly.
The middle-end does not treat stack moves specially, they are just
memory accesses. Extra dependences have to be modeled accordingly.
It's a hack to treat stack moves specially, not a proper fix.
In GCC 4.6.2 20111014 (prerelease) the problem is still not fixed and "arm-eabi-gcc -march=armv5t -mthumb -O2" produces wrong code. Please don't let it slip through the next release. I tested the above patch proposed by Mikael Pettersson (from 2010-05-26, more than one year ago) with GCC 4.6 20111021. It still fixes the test case provided by Dave Murphy (from 2008-12-27, nearly three years ago). I run the GCC C and C++ testsuite with the arm-rtemseabi4.11 target and the results are identical with and without the patch. Even if we have no perfect solution yet, it would be very kind if someone can check in an intermediate fix. I tested with GCC 4.6.2 and the patch provided by Mikael Pettersson. It works for -march=armv4t and -march=armv5t, but not for -march=armv5te:
--- test-armv5te.s 2011-10-28 09:22:24.627388063 +0200
+++ test-armv5t.s 2011-10-28 09:22:19.923643155 +0200
@@ -1,4 +1,4 @@
- .arch armv5te
+ .arch armv5t
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
@@ -27,8 +27,8 @@
mov r1, r4
mov r2, #1
bl doStreamReadBlock
- add sp, sp, #8
ldrb r0, [r4]
+ add sp, sp, #8
@ sp needed for prologue
pop {r4, pc}
.size readStream, .-readStream
Command line:
arm-eabi-gcc -O2 -march=armv5t -mthumb -S test.c -o test-armv5t.s
arm-eabi-gcc -O2 -march=armv5te -mthumb -S test.c -o test-armv5te.s
(In reply to comment #54) > I tested with GCC 4.6.2 and the patch provided by Mikael Pettersson. It works > for -march=armv4t and -march=armv5t, but not for -march=armv5te: For what it's worth I've been using Richard Earnshaw's patch from http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30282#c8 with my own gcc builds and it's working fine for all -march values. There's also Joern's patch at http://gcc.gnu.org/ml/gcc/2011-07/msg00461.html which I haven't tested but looks like it should work. I still don't understand why there seems to be so much resistance to Richard's suggestion that targets with redzones should explicitly enable this behaviour. How can it be a hack to treat stack moves specially? Isn't the stack generally a special register? (In reply to comment #54) > I tested with GCC 4.6.2 and the patch provided by Mikael Pettersson. It works > for -march=armv4t and -march=armv5t, but not for -march=armv5te: > Sebastian, Actually you may try this, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index aed748c..8269c1a 100755 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -22273,6 +22273,8 @@ thumb1_expand_epilogue (void) gcc_assert (amount >= 0); if (amount) { + emit_insn (gen_blockage ()); + if (amount < 512) emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx, GEN_INT (amount))); Thanks, -Jiangning (In reply to comment #56) > (In reply to comment #54) > > I tested with GCC 4.6.2 and the patch provided by Mikael Pettersson. It works > > for -march=armv4t and -march=armv5t, but not for -march=armv5te: > > > > Sebastian, > > Actually you may try this, > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index aed748c..8269c1a 100755 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -22273,6 +22273,8 @@ thumb1_expand_epilogue (void) > gcc_assert (amount >= 0); > if (amount) > { > + emit_insn (gen_blockage ()); > + > if (amount < 512) > emit_insn (gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx, > GEN_INT (amount))); > > Thanks, > -Jiangning Yeah, that should be even better than my patch. Thanks. I tested Jiangning Liu's latest patch. With it GCC 4.6.2 produces valid code for -march=armv4t, -march=armv5t, -march=armv5te, -march=armv6, and -march=armv7-m. GCC 4.6.2 produces valid code for -march-armv7-m also without the patch, for all other listed options the code is wrong. Testsuite results: http://gcc.gnu.org/ml/gcc-testresults/2011-10/msg03523.html Author: jye2 Date: Fri Nov 4 16:50:04 2011 New Revision: 180964 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=180964 Log: 2011-11-04 Jiangning Liu <jiangning.liu@arm.com> PR rtl-optimization/38644 * config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier for epilogue having stack adjustment. testcase: * gcc.target/arm/stack-red-zone.c: New. Added: trunk/gcc/testsuite/gcc.target/arm/stack-red-zone.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c trunk/gcc/testsuite/ChangeLog Jiangning Liu thank you very much for your update. The target milestone is currently 4.4.7. Are there plans to commit this fix the the 4.4, 4.5, and 4.6 branches? Author: liujiangning Date: Wed Nov 16 09:46:58 2011 New Revision: 181406 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181406 Log: 2011-11-16 Jiangning Liu <jiangning.liu@arm.com> Backport r180964 from mainline 2011-11-04 Jiangning Liu <jiangning.liu@arm.com> PR rtl-optimization/38644 * config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier for epilogue having stack adjustment. testsuite: 2011-11-16 Jiangning Liu <jiangning.liu@arm.com> Backport r180964 from mainline 2011-11-04 Jiangning Liu <jiangning.liu@arm.com> PR rtl-optimization/38644 * gcc.target/arm/stack-red-zone.c: New. Added: branches/ARM/embedded-4_6-branch/gcc/testsuite/gcc.target/arm/stack-red-zone.c Modified: branches/ARM/embedded-4_6-branch/gcc/ChangeLog.arm branches/ARM/embedded-4_6-branch/gcc/config/arm/arm.c branches/ARM/embedded-4_6-branch/gcc/testsuite/ChangeLog.arm Author: ramana Date: Mon Jan 9 16:55:16 2012 New Revision: 183019 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183019 Log: 2012-01-09 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> Backport from mainline 2011-11-04 Jiangning Liu <jiangning.liu@arm.com> PR rtl-optimization/38644 * config/arm/arm.c (thumb1_expand_epilogue): Add memory barrier for epilogue having stack adjustment. 2012-01-09 Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> Backport from mainline: 2011-11-04 Jiangning Liu <jiangning.liu@arm.com> PR rtl-optimization/38644 * gcc.target/arm/stack-red-zone.c: New. Added: branches/gcc-4_6-branch/gcc/testsuite/gcc.target/arm/stack-red-zone.c Modified: branches/gcc-4_6-branch/gcc/ChangeLog branches/gcc-4_6-branch/gcc/config/arm/arm.c branches/gcc-4_6-branch/gcc/testsuite/ChangeLog 4.4 branch is being closed, moving to 4.5.4 target. The 4.5 branch is being closed, adjusting target milestone. Also saw the same problem on MIPS Compiling using 'gcc version 4.5.2 (Sourcery CodeBench Lite 2011.09-86)' Produces the below assembly (mips16) 4c13 addiu a0,19 6478 restore 64,ra,s0-s1 f3a6 dd50 sw v0,13232(a1) e820 jr ra 8c40 lh v0,0(a0) 6500 nop You can see the 'lh' comes after 'restore' , so the problem exist on MIPS Sourcery as well. (In reply to comment #65) > Also saw the same problem on MIPS > Compiling using 'gcc version 4.5.2 (Sourcery CodeBench Lite 2011.09-86)' You should report that issue to Mentor because it was fixed with: 2011-10-02 Richard Sandiford <rdsandiford@googlemail.com> * config/mips/mips.c (mips_frame_barrier): New function. (mips_expand_prologue): Call it after allocating stack space. (mips_deallocate_stack): New function. (mips_expand_epilogue): Call mips_frame_barrier and mips_deallocate_stack. Just reported to Mentor. Thanks Is this problem resolved? The status is still set to "NEW" but "known to work" shows that it either has been resolved in v4.7.0 or that version is somehow not affected. I spent the best part of this week trying to track this problem down. It was VERY hard to reproduce but I eventually found that the -fschedule-insns2 compile option caused the problem. The code that was affected is almost exactly the same as the test case in comment #1. (In reply to comment #68) > Is this problem resolved? The status is still set to "NEW" but "known to work" > shows that it either has been resolved in v4.7.0 or that version is somehow not > affected. It was resolved with the patch in comment #59 and comment #62. In GCC 4.6.0. (In reply to comment #69) > (In reply to comment #68) > > Is this problem resolved? The status is still set to "NEW" but "known to work" > > shows that it either has been resolved in v4.7.0 or that version is somehow not > > affected. > > It was resolved with the patch in comment #59 and comment #62. In GCC 4.6.0. It was fixed in GCC 4.6.3 and above. *** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Marked for reference. Resolved as fixed @bugzilla. This bug still exists in GCC 4.8.2 ARM. It can reproduced by adding one more argument in 'doStreamReadBlock' function in test case.
/x86_64-unknown-linux-gnu/bin/gcc -B /x86_64-unknown-linux-gnu/bin -msoft-float -marm -mcpu=cortex-a9 -march=armv7-a -mno-thumb-interwork -mlong-calls -mno-unaligned-access -O2 test.c
extern int doStreamReadBlock (int *, char *, int size, int, int);
char readStream (int *s)
{
char c = 0;
doStreamReadBlock (s, &c, 1, *s, 22);
return c;
}
00000000 <readStream>:
0: e1a0c00d mov ip, sp
4: e3a02000 mov r2, #0
8: e92dd800 push {fp, ip, lr, pc}
c: e24cb004 sub fp, ip, #4
10: e24dd008 sub sp, sp, #8
14: e24b100c sub r1, fp, #12
18: e3a0c016 mov ip, #22
1c: e5612001 strb r2, [r1, #-1]!
20: e3a02001 mov r2, #1
24: e5903000 ldr r3, [r0]
28: e58dc000 str ip, [sp]
2c: e59fc00c ldr ip, [pc, #12] ; 40 <readStream+0x40>
30: e12fff3c blx ip
34: e24bd00c sub sp, fp, #12 <---- Stack frame de-allocated
38: e55b000d ldrb r0, [fp, #-13] <---- Accessing stack.
3c: e89da800 ldm sp, {fp, sp, pc}
40: 00000000 andeq r0, r0, r0
(In reply to Satish M from comment #72) > This bug still exists in GCC 4.8.2 ARM. It can reproduced by adding one more > argument in 'doStreamReadBlock' function in test case. > > /x86_64-unknown-linux-gnu/bin/gcc -B /x86_64-unknown-linux-gnu/bin > -msoft-float -marm -mcpu=cortex-a9 -march=armv7-a -mno-thumb-interwork > -mlong-calls -mno-unaligned-access -O2 test.c > > extern int doStreamReadBlock (int *, char *, int size, int, int); > > char readStream (int *s) > { > char c = 0; > doStreamReadBlock (s, &c, 1, *s, 22); > return c; > } > > 00000000 <readStream>: > 0: e1a0c00d mov ip, sp > 4: e3a02000 mov r2, #0 > 8: e92dd800 push {fp, ip, lr, pc} > c: e24cb004 sub fp, ip, #4 > 10: e24dd008 sub sp, sp, #8 > 14: e24b100c sub r1, fp, #12 > 18: e3a0c016 mov ip, #22 > 1c: e5612001 strb r2, [r1, #-1]! > 20: e3a02001 mov r2, #1 > 24: e5903000 ldr r3, [r0] > 28: e58dc000 str ip, [sp] > 2c: e59fc00c ldr ip, [pc, #12] ; 40 <readStream+0x40> > 30: e12fff3c blx ip > 34: e24bd00c sub sp, fp, #12 <---- Stack frame de-allocated > 38: e55b000d ldrb r0, [fp, #-13] <---- Accessing stack. > 3c: e89da800 ldm sp, {fp, sp, pc} > 40: 00000000 andeq r0, r0, r0 Please open a new bugreport. The master branch has been updated by Eric Botcazou <ebotcazou@gcc.gnu.org>: https://gcc.gnu.org/g:e39b170695a161feba7401b7d21d824db9ee1f8f commit r13-3296-ge39b170695a161feba7401b7d21d824db9ee1f8f Author: Eric Botcazou <ebotcazou@adacore.com> Date: Fri Oct 14 11:52:04 2022 +0200 Fix PR target/107248 This is the infamous PR rtl-optimization/38644 rearing its ugly head for leaf functions on SPARC more than a decade later... Richard E.'s generic solution has never been implemented so let's do as other RISC back-ends did. gcc/ PR target/107248 * config/sparc/sparc.cc (sparc_expand_prologue): Emit a frame blockage for leaf functions. (sparc_flat_expand_prologue): Emit frame instead of full blockage. (sparc_expand_epilogue): Emit a frame blockage for leaf functions. (sparc_flat_expand_epilogue): Emit frame instead of full blockage. The releases/gcc-12 branch has been updated by Eric Botcazou <ebotcazou@gcc.gnu.org>: https://gcc.gnu.org/g:a5a6598d5b1d29741993371310c0bb8ca57e190c commit r12-8831-ga5a6598d5b1d29741993371310c0bb8ca57e190c Author: Eric Botcazou <ebotcazou@adacore.com> Date: Fri Oct 14 11:52:04 2022 +0200 Fix PR target/107248 This is the infamous PR rtl-optimization/38644 rearing its ugly head for leaf functions on SPARC more than a decade later... Richard E.'s generic solution has never been implemented so let's do as other RISC back-ends did. gcc/ PR target/107248 * config/sparc/sparc.cc (sparc_expand_prologue): Emit a frame blockage for leaf functions. (sparc_flat_expand_prologue): Emit frame instead of full blockage. (sparc_expand_epilogue): Emit a frame blockage for leaf functions. (sparc_flat_expand_epilogue): Emit frame instead of full blockage. The releases/gcc-11 branch has been updated by Eric Botcazou <ebotcazou@gcc.gnu.org>: https://gcc.gnu.org/g:3f4b65df625edae3e8829718af721ad2330b3f22 commit r11-10311-g3f4b65df625edae3e8829718af721ad2330b3f22 Author: Eric Botcazou <ebotcazou@adacore.com> Date: Fri Oct 14 11:52:04 2022 +0200 Fix PR target/107248 This is the infamous PR rtl-optimization/38644 rearing its ugly head for leaf functions on SPARC more than a decade later... Richard E.'s generic solution has never been implemented so let's do as other RISC back-ends did. gcc/ PR target/107248 * config/sparc/sparc.c (sparc_expand_prologue): Emit a frame blockage for leaf functions. (sparc_flat_expand_prologue): Emit frame instead of full blockage. (sparc_expand_epilogue): Emit a frame blockage for leaf functions. (sparc_flat_expand_epilogue): Emit frame instead of full blockage. The releases/gcc-10 branch has been updated by Eric Botcazou <ebotcazou@gcc.gnu.org>: https://gcc.gnu.org/g:d0ef37c35b7ff7324b4567652380f32079d46088 commit r10-11034-gd0ef37c35b7ff7324b4567652380f32079d46088 Author: Eric Botcazou <ebotcazou@adacore.com> Date: Fri Oct 14 11:52:04 2022 +0200 Fix PR target/107248 This is the infamous PR rtl-optimization/38644 rearing its ugly head for leaf functions on SPARC more than a decade later... Richard E.'s generic solution has never been implemented so let's do as other RISC back-ends did. gcc/ PR target/107248 * config/sparc/sparc.c (sparc_expand_prologue): Emit a frame blockage for leaf functions. (sparc_flat_expand_prologue): Emit frame instead of full blockage. (sparc_expand_epilogue): Emit a frame blockage for leaf functions. (sparc_flat_expand_epilogue): Emit frame instead of full blockage. |