Bug 30282 - Optimization flag -O1 -fschedule-insns2 cause red zone to be used when there is none
Optimization flag -O1 -fschedule-insns2 cause red zone to be used when there ...
Status: RESOLVED FIXED
Product: gcc
Classification: Unclassified
Component: target
3.4.6
: P3 normal
: ---
Assigned To: Alan Modra
: wrong-code
: 30736 31898 42953 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-23 10:09 UTC by chenkb
Modified: 2011-11-08 05:21 UTC (History)
10 users (show)

See Also:
Host:
Target: powerpc-linux-gnu powerpc-eabi
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-12-23 10:31:55


Attachments
preprocessed source for arm-eabi testcase (220 bytes, application/octet-stream)
2008-12-27 21:46 UTC, Dave Murphy
Details
Possible patch (500 bytes, patch)
2010-02-04 10:48 UTC, Richard Earnshaw
Details | Diff
belt and braces fix for rs6000.c (2.47 KB, patch)
2011-08-05 16:23 UTC, Alan Modra
Details | Diff
minimal rs6000.c fix (553 bytes, patch)
2011-08-05 16:29 UTC, Alan Modra
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description chenkb 2006-12-23 10:09:49 UTC
Stack operating code that generated by gcc have an error. In some operating system that saved the thread's context in stack when schedule, it will be have critical problem.

Enabling the compiler optimalisation (-O2 option) the error will be occuring, and don't occur when using option -O1 or -O0.

--- sample ---
array.c:
int find_num(int i)
{
    const int arr[5] = {0, 1, 2, 3, 4};
    return arr[i];
}
using command:
  ppc-eabi-gcc -c -O2 array.c -o array.o
  ppc-eabi-objdump -S array.o
the disassemble code is:
00000000 <find_num>:
   0:   3d 60 00 00     lis     r11,0
   4:   94 21 ff d0     stwu    r1,-48(r1)
   8:   39 2b 00 00     addi    r9,r11,0
   c:   81 0b 00 00     lwz     r8,0(r11)
  10:   80 e9 00 10     lwz     r7,16(r9)
  14:   54 63 10 3a     rlwinm  r3,r3,2,0,29
  18:   80 09 00 04     lwz     r0,4(r9)
  1c:   81 69 00 08     lwz     r11,8(r9)
  20:   81 49 00 0c     lwz     r10,12(r9)
  24:   7d 21 1a 14     add     r9,r1,r3
  28:   91 01 00 10     stw     r8,16(r1)
  2c:   90 01 00 14     stw     r0,20(r1)
  30:   91 61 00 18     stw     r11,24(r1)
  34:   91 41 00 1c     stw     r10,28(r1)
  38:   90 e1 00 20     stw     r7,32(r1)
  3c:   38 21 00 30     addi    r1,r1,48
  40:   80 69 00 10     lwz     r3,16(r9)
  44:   4e 80 00 20     blr

The instruction 3c let stack pointer back, and instruction 40 load data from the stack, this is error. The true code must be:
    lwz     r3,16(r9)
    addi    r1,r1,48
And when you using option -O1, the code generated is true.

Release:
GCC v3.4.6 ppc-eabi cross compiler in cygwin

Environment:
Reading specs from /cygdrive/d/Compiler/gcc3.4.6-ppc-eabi/bin/../lib/gcc/ppc-eab
i/3.4.6/specs
Configured with: ../configure --target=ppc-eabi --prefix=/usr/local/gcc3.4.6-ppc
-eabi --enable-languages=c
Thread model: single
gcc version 3.4.6

How-To-Repeat:
Using my sample.
Comment 1 Andrew Pinski 2006-12-23 10:31:55 UTC
For 4.0 and above your sample code works but that is a different issue.
Here is a testcase which fails for 4.0 and above:
int find_num(int i)
{
    int arr[5] = {0, 1, 2, 3, 4};
    return arr[i];
}
----
The problem is the scheduler is moving the load past the update of r1 which is invalid for the SYSV ABI to do as there is no red zone.

This has almost always been an issue since the rs6000 target was added and/or the scheduler was added.
Comment 2 Andrew Pinski 2007-02-08 19:17:15 UTC
*** Bug 30736 has been marked as a duplicate of this bug. ***
Comment 3 Richard Biener 2007-05-11 13:16:09 UTC
*** Bug 31898 has been marked as a duplicate of this bug. ***
Comment 4 Dave Murphy 2008-12-27 21:46:55 UTC
Created attachment 16992 [details]
preprocessed source for arm-eabi testcase
Comment 5 Dave Murphy 2008-12-27 21:48:36 UTC
Comment on attachment 16992 [details]
preprocessed source for arm-eabi testcase

I've tripped over this bug on arm-eabi too, with this code, compiled as thumb.

extern int doStreamReadBlock(int *, char *, int size, int);

char readStream(int *s)
{
    char c = 0;
    doStreamReadBlock(s, &c, 1, *s);
    return c;
}

arm-eabi davem$ arm-eabi-gcc -v -O2 -mthumb -c test.c 
Using built-in specs.
Target: arm-eabi
Configured with: ../../gcc-4.3.2/configure --enable-languages=c,c++ --with-cpu=arm7tdmi --enable-interwork --enable-multilib --with-gcc --with-gnu-ld --with-gnu-as --disable-shared --disable-threads --disable-win32-registry --disable-nls --disable-debug --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --target=arm-eabi --with-newlib --prefix=/opt/devkitpro/devkitARM --with-bugurl=http://wiki.devkitpro.org/index.php/Bug_Reports --with-pkgversion='devkitARM release 24'
Thread model: single
gcc version 4.3.2 (devkitARM release 24) 

00000000 <readStream>:
   0:	b510      	push	{r4, lr}
   2:	b082      	sub	sp, #8
   4:	466c      	mov	r4, sp
   6:	3407      	adds	r4, #7
   8:	2300      	movs	r3, #0
   a:	7023      	strb	r3, [r4, #0]
   c:	1c21      	adds	r1, r4, #0
   e:	6803      	ldr	r3, [r0, #0]
  10:	2201      	movs	r2, #1
  12:	f7ff fffe 	bl	0 <doStreamReadBlock>
  16:	b002      	add	sp, #8
  18:	7820      	ldrb	r0, [r4, #0]
  1a:	bc10      	pop	{r4}
  1c:	bc02      	pop	{r1}
  1e:	4708      	bx	r1

The same thing happens with mainline

 /opt/devkitpro/devkitARM_mainline/bin/arm-eabi-gcc -v -O2 -mthumb -c test.c 
Using built-in specs.
Target: arm-eabi
Configured with: ../../../gcc_mainline/configure --disable-nls --target=arm-eabi --prefix=/opt/devkitPro/devkitARM --enable-languages=c,c++ --with-cpu=arm7tdmi --enable-interwork --enable-multilib --with-gcc --with-gnu-ld --with-gnu-as --disable-shared --disable-threads --disable-win32-registry --disable-nls --disable-debug --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch
Thread model: single
gcc version 4.4.0 20081223 (experimental) (GCC) 

00000000 <readStream>:
   0:	b510      	push	{r4, lr}
   2:	b082      	sub	sp, #8
   4:	466c      	mov	r4, sp
   6:	3407      	adds	r4, #7
   8:	2300      	movs	r3, #0
   a:	7023      	strb	r3, [r4, #0]
   c:	1c21      	adds	r1, r4, #0
   e:	6803      	ldr	r3, [r0, #0]
  10:	2201      	movs	r2, #1
  12:	f7ff fffe 	bl	0 <doStreamReadBlock>
  16:	b002      	add	sp, #8
  18:	7820      	ldrb	r0, [r4, #0]
  1a:	bc10      	pop	{r4}
  1c:	bc02      	pop	{r1}
  1e:	4708      	bx	r1
Comment 6 Andrew Pinski 2008-12-27 21:51:01 UTC
(In reply to comment #5)
> (From update of attachment 16992 [details] [edit])
> I've tripped over this bug on arm-eabi too, with this code, compiled as thumb.

Except this bug is about ppc sysv ABI and not really about ARM-eabi, please open a new bug report for that.
Comment 7 Andrew Pinski 2010-02-04 07:36:41 UTC
*** Bug 42953 has been marked as a duplicate of this bug. ***
Comment 8 Richard Earnshaw 2010-02-04 10:48:11 UTC
Created attachment 19803 [details]
Possible patch

I've been testing the attached patch on ARM (well, thumb) where there's a similar issue.  It's perhaps a bit more aggressive than it needs to be, but it should solve the problem generically rather than requiring each back-end to implement hacks for what is really a MI issue.

Could someone test this on PPC?
Comment 9 pinskia@gmail.com 2010-02-04 16:36:58 UTC
Subject: Re:  Optimization flag -O1 -fschedule-insns2 cause red zone to be used when there is none



Sent from my iPhone

On Feb 4, 2010, at 2:48 AM, "rearnsha at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org 
 > wrote:

>
>
> ------- Comment #8 from rearnsha at gcc dot gnu dot org  2010-02-04  
> 10:48 -------
> Created an attachment (id=19803)
> --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19803&action=view)
> Possible patch
>
> I've been testing the attached patch on ARM (well, thumb) where  
> there's a
> similar issue.  It's perhaps a bit more aggressive than it needs to  
> be, but it
> should solve the problem generically rather than requiring each back- 
> end to
> implement hacks for what is really a MI issue.
>
> Could someone test this on PPC?

Well powerpc64 it is valid to move across the stack pointer if the  
stack is less than a specific size so this can cause regressions  
there. And will also cause a performance regressions on x86_64 also  
for the same reason.

>
>
> -- 
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30282
>
Comment 10 Jim Wilson 2010-02-04 22:16:08 UTC
Subject: Re:  Optimization flag -O1 -fschedule-insns2
 cause red zone to be used when there is none

On Thu, 2010-02-04 at 16:36 +0000, pinskia at gmail dot com wrote:
> Well powerpc64 it is valid to move across the stack pointer if the  
> stack is less than a specific size so this can cause regressions  
> there. And will also cause a performance regressions on x86_64 also  
> for the same reason.

We could perhaps make it a target hook.  For powerpc, the hook returns
true if this is the sysv ABI, and returns false if this is the AIX ABI.
That might also be useful for ARM, if you need it to be true for thumb
and false for the arm port.

I'm not sure at the moment if this is necessary though.  The testcase in
this bug report is a little different than the one I have.  First thing
I notice is that the powerpc port is not emitting the stack_tie insn
here because there is no frame pointer.  But I think it is always
necessary for the sysv ABI.  If I modify rs6000_emit_stack_reset to
check DEFAULT_ABI == ABI_V4 in the rs6000_emit_stack_tie check, then I
get correct code for this testcase.  I still need to modify the
stack_tie pattern to use (MEM:BLK (scratch)) to get correct code for my
testcase.  These are two fairly simple powerpc backend changes though.

I'll have to look at the arm/thumb port next to see what is going on
there.

Jim



Comment 11 wilson@codesourcery.com 2010-02-06 06:23:23 UTC
Subject: Re:  Optimization flag -O1 -fschedule-insns2
 cause red zone to be used when there is none

On Thu, 2010-02-04 at 10:48 +0000, rearnsha at gcc dot gnu dot org
wrote:
> I've been testing the attached patch on ARM (well, thumb) where there's a
> similar issue. 
> Could someone test this on PPC?

I tried it on PPC for the two testcases in PR 30282 and PR 42953 and it
works for both of them.  However, as Andrew Pinski mentioned, this is
wrong for some targets, so we would have to make this a target hook to
be usable.

Meanwhile, I also looked at the thumb testcase, and it looks like a
simple matter of emitting a stack_tie insn in thumb1_expand_epilogue
before the SP sets are emitted.  The ARM port currently only emits the
stack_tie insn in the prologue.

So we have two possible solutions here:
1) Add a new target hook to control whether sched makes stack pointer
sets depend on all MEMs.
2) Emit stack_tie in epilogue always for Thumb and rs6000 ABI_V4, and
modify rs6000 stack_tie to use (mem (scratch)) like ARM.

I don't feel strongly either way, but considering that the prologue and
epilogue code already contain a lot of target dependent magic, I don't
see the need for adding more target hooks when all we need is a few
small ARM and rs6000 port changes.  The second change also makes the RTL
self-descriptive.  With the first change, we have to make sure that any
optimization pass that might move instructions around knows that stack
pointers sets are special, and conflict with all MEMs.

Jim


Comment 12 Richard Earnshaw 2010-02-06 13:03:06 UTC
Yes, this could be fixed in the Thumb back-end by doing it the same way as the ARM back-end does.  However, I still think that is papering over a subtle problem in the scheduler.  This is an insidious problem that can affect a number of ports (since it quietly leads to hard-to-detect wrong-code problems) and I feel it's poor design in the compiler to leave it up to each port maintainer to find the problem (bugs like this are often not found by users during testing because it requires an asynchronous event to occur at exactly the right moment to expose them).

I strongly believe the scheduler should have along the lines of the one I've proposed, and if there is a hook, then the default behaviour should be to block scheduling across a stack adjustment.  Ports which are known to have a stack red-zone can then disable the effect and gain an advantage -- that has to be better than leaving subtle bugs in user's code.
Comment 13 Darren Jenkins 2010-09-22 00:20:08 UTC
I seem to be getting this bug on arm thumb also

USB_INT16U  ReadLE16U ( volatile  USB_INT08U  *pmem )
{
    USB_INT16U   val;
    USB_INT08U *bytes = (USB_INT08U *)&val;

    bytes[0] = pmem[0];
    bytes[1] = pmem[1];

    return val;
}

    B580        push {r7, lr}
    B081        sub sp, #4
    7802        ldrb r2, [r0, #0]
    AF00        add r7, sp, #0
    1CBB        adds r3, r7, #2
    701A        strb r2, [r3, #0]
    46BD        mov sp, r7
    B001        add sp, #4
    7842        ldrb r2, [r0, #1]
    8818        ldrh r0, [r3, #0]
    BC80        pop {r7}
    BC02        pop {r1}
    4708        bx r1

This happens on both 4.4.4 and 4.5.1
Comment 14 Andrew Pinski 2010-09-22 00:33:57 UTC
(In reply to comment #13)
> I seem to be getting this bug on arm thumb also

That is a different bug, see PR 38644.  This bug records the PowerPC specific bug.
Comment 15 Alan Modra 2011-08-05 16:23:04 UTC
Created attachment 24922 [details]
belt and braces fix for rs6000.c

This fix goes overboard to guard against possible future compiler changes.  I use Jakub's frame_tie in rs6000_emit_stack_reset when frame_reg_rtx is not sp, as he found necessary (alias analysis decided fp and sp accesses couldn't alias?), just in case alias analysis further loses the plot.  I also make the frame_tie mem constraint +m for sp as well as fp, to make it explicit that reads via sp or regs set from sp, shouldn't cross the stack adjust.  This is definitely belts and braces stuff, but I don't believe restricts the scheduler any more than a minimal fix.
Comment 16 Alan Modra 2011-08-05 16:29:02 UTC
Created attachment 24923 [details]
minimal rs6000.c fix

This fix just ensures we have a stack tie at the ABI_V4 stack reset.  Note that only for ABI_V4 do we currently have frame_reg_rtx != sp_reg_rtx in rs6000_emit_stack_reset.
Comment 17 Peter Bergner 2011-10-13 16:11:47 UTC
Alan, what is the status of your patch in comment #16?  Do you plan on submitting that or are you thinking we need a different fix on ppc?
Comment 18 Alan Modra 2011-11-07 01:14:37 UTC
Author: amodra
Date: Mon Nov  7 01:14:33 2011
New Revision: 181056

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181056
Log:
	PR target/30282
	* config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit
	blockage for ABI_V4.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/rs6000/rs6000.c
Comment 19 Alan Modra 2011-11-07 01:15:13 UTC
Author: amodra
Date: Mon Nov  7 01:15:08 2011
New Revision: 181057

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181057
Log:
	PR target/30282
	* config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit
	blockage for ABI_V4.


Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/config/rs6000/rs6000.c
Comment 20 Alan Modra 2011-11-07 01:15:42 UTC
Author: amodra
Date: Mon Nov  7 01:15:35 2011
New Revision: 181058

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181058
Log:
	PR target/30282
	* config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit
	blockage for ABI_V4.


Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/config/rs6000/rs6000.c
Comment 21 Alan Modra 2011-11-07 01:16:06 UTC
Author: amodra
Date: Mon Nov  7 01:16:01 2011
New Revision: 181059

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181059
Log:
	PR target/30282
	* config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit
	blockage for ABI_V4.


Modified:
    branches/gcc-4_4-branch/gcc/ChangeLog
    branches/gcc-4_4-branch/gcc/config/rs6000/rs6000.c
Comment 22 Alan Modra 2011-11-07 01:39:11 UTC
Author: amodra
Date: Mon Nov  7 01:39:08 2011
New Revision: 181060

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181060
Log:
	PR target/30282
	* config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit
	blockage for ABI_V4.


Modified:
    branches/ibm/gcc-4_6-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_6-branch/gcc/config/rs6000/rs6000.c
Comment 23 Alan Modra 2011-11-07 01:39:24 UTC
Author: amodra
Date: Mon Nov  7 01:39:22 2011
New Revision: 181061

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181061
Log:
	PR target/30282
	* config/rs6000/rs6000.c (rs6000_emit_stack_reset): Always emit
	blockage for ABI_V4.


Modified:
    branches/ibm/gcc-4_5-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_5-branch/gcc/config/rs6000/rs6000.c
Comment 24 Alan Modra 2011-11-07 01:40:45 UTC
Fixed
Comment 25 Alan Modra 2011-11-08 05:21:57 UTC
Actually, the testcase in #1 still fails on gcc-4.4.  Looks like gcc-4.4 (and earlier) need to use a bigger blockage.