Bug 24675 - Stack corruption in ARM arch. if 64bit variable is passed to a function of which the low 32 use the register and the up 32 use the stack
Summary: Stack corruption in ARM arch. if 64bit variable is passed to a function of wh...
Status: RESOLVED DUPLICATE of bug 23150
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 3.2.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on: 23150
Blocks:
  Show dependency treegraph
 
Reported: 2005-11-04 15:05 UTC by Bill Thompson
Modified: 2005-11-10 01:25 UTC (History)
4 users (show)

See Also:
Host:
Target: arm-elf
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Thompson 2005-11-04 15:05:38 UTC
Reproducable: Very easy to reproduce with the sample code 
application provided. PLEASE COMPILE the code with -O2 option.

Product: GCC for ARMV5L

Component: C

Version: 3.2.1. (Reported also in ARM Linux community to 
be see with gcc 3.3.4, gcc 3.4.2) 

Host Platform: x86 Linux 2.4.x

Target Platform: Intel Xscale 80315 (ARMV5L) running Linux 2.4.28.
Issue should be seen with most of the ARM target.

Description:
Stack corruption is seen in ARM arch. when many variables 
are passed to a function AND
if one of the variable is long long AND
if that long long variable is passed by its lower-32 use the 
register and the upper-32 use the stack (a case when r0, r1, 
r2 is already used by other passed variables)

Values are not correctly passed to the function.

Sample Code (PLEASE COMPILE with -O2 option):
------------------------------------------------------------
#include <stdio.h>

typedef unsigned char u8;
typedef unsigned int u32;
typedef unsigned long long u64;

void testfunction (void *buffer1, void *buffer2, u8 count, u64 startsector);
void calledfunction1(void *buffer, u64 startsector, u32 count, u8
opcode, u32 sign);

main()
{
   testfunction (NULL, NULL, 8, 0x700ULL);
}

void testfunction (void *buffer1, void *buffer2, u8 count, u64 startsector)
{
    calledfunction1 (NULL, startsector, 0x55, 0x20, 0x3a3a3a3a);
}

void calledfunction1 (void *buffer, u64 startsector, u32 count, 
u8 opcode, u32 sign)
{
   if(opcode == 0x3a)
      printf( "opcode now is 0x3a!!!!\n");

   printf ("opcode: %x, ", opcode);
   printf( "sign:%x\n",sign);

   return;
}
------------------------------------------------------------

Output of the Sample Code:
--------------------------------
opcode now is 0x3a!!!!
opcode: 3a, sign:40039420
--------------------------------

Expected Output of the Sample Code:
--------------------------------
opcode: 20, sign:3a3a3a3a
--------------------------------
Comment 1 Andrew Pinski 2005-11-04 15:08:23 UTC
3.2.1 is an old compiler and the 3.2 series is no longer being updated, can you try either a 3.3.6 (note the 3.3 series is not being updated either) or a 3.4.4 or a 4.0.2 GCC?
Comment 2 Andre 2005-11-09 02:11:50 UTC
Here's a slightly smaller test case:

------------------------------------------------------------
extern void foo (int f1, int f2, int f3, int f4, int f5, int f6);

void good (int g1, int g2, int g3, int g4, int g5)
{
	foo (0, 0, 0, 0, 0, 0);
}

void bad (int b1, int b2, int b3, long long b45)
{
	foo (0, 0, 0, 0, 0, 0);
}
------------------------------------------------------------

Compiled with gcc 4.0.1 (-Os), this gives:

	.file	"tst.c"
	.text
	.align	2
	.global	good
	.type	good, %function
good:
	@ args = 4, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	str	lr, [sp, #-4]!
	mov	ip, #0
	sub	sp, sp, #8
	mov	r0, ip
	mov	r1, ip
	mov	r2, ip
	mov	r3, ip
	str	ip, [sp, #0]
	str	ip, [sp, #4]
	bl	foo
	add	sp, sp, #8
	ldmfd	sp!, {pc}
	.size	good, .-good
	.align	2
	.global	bad
	.type	bad, %function
bad:
	@ args = 8, pretend = 4, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	mov	ip, #0
	sub	sp, sp, #4
	str	r3, [sp, #0]
	mov	r0, ip
	mov	r1, ip
	mov	r2, ip
	mov	r3, ip
	@ lr needed for prologue
	str	ip, [sp, #4]
	str	ip, [sp, #8]	<-- BANG... caller's stack is overwritten !!
	add	sp, sp, #4
	b	foo
	.size	bad, .-bad
	.ident	"GCC: (GNU) 4.0.1"

------------------------------------------------------------

The bug is also present in 3.4.4

Comment 3 Andre 2005-11-09 04:21:00 UTC
A few more results...

1) gcc 4.0.2 _is_ also buggy

2) Bug seems to be associated with -foptimize-sibling-calls 
ie previous code compiled with:

arm-linux-gcc-4.0.2 -O1 -foptimize-sibling-calls

gives:

	.align	2
	.global	bad
	.type	bad, %function
bad:
	@ args = 8, pretend = 4, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	sub	sp, sp, #4
	@ lr needed for prologue
	str	r3, [sp, #0]
	mov	r3, #0
	str	r3, [sp, #4]
	str	r3, [sp, #8]	<-- BANG... caller's stack overwritten !!
	mov	r0, r3
	mov	r1, r3
	mov	r2, r3
	add	sp, sp, #4
	b	foo
	.size	bad, .-bad
	.ident	"GCC: (GNU) 4.0.2"


For reference, arm-linux-gcc-4.0.2 -O1 
gives:

	.align	2
	.global	bad
	.type	bad, %function
bad:
	@ args = 8, pretend = 4, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	sub	sp, sp, #4
	str	lr, [sp, #-4]!
	sub	sp, sp, #8
	str	r3, [sp, #12]
	mov	r3, #0
	str	r3, [sp, #0]
	str	r3, [sp, #4]
	mov	r0, r3
	mov	r1, r3
	mov	r2, r3
	bl	foo
	add	sp, sp, #8
	ldr	lr, [sp], #4
	add	sp, sp, #4
	bx	lr
	.size	bad, .-bad
	.ident	"GCC: (GNU) 4.0.2"

(ie not particularly optimal, but no stack corruption).


Comment 4 Andrew Pinski 2005-11-09 04:31:54 UTC
I think this is a dup of bug 23150 which was fixed in 4.1.0.
Comment 5 Andre 2005-11-09 06:21:22 UTC
(In reply to comment #4)
> I think this is a dup of bug 23150 which was fixed in 4.1.0.
> 

I don't think so.

I rebuilt 4.0.2 after applying the patch given for bug 23150.
The patched version of 4.0.2 gives the same result as the vanilla one for this test.

Also, the testcase for 23150 does not seem to fail with any of the arm-linux-gcc (ie not arm-eabi) versions I tried it with. Then again, the testcase for 23150 doesn't fail with 'gcc version 3.4.4 (release) (CodeSourcery ARM 2005q3-1)', which is the only arm-eabi compiler I have (but maybe new enough to have that bug fixed).
Comment 6 Mikael Pettersson 2005-11-09 22:04:09 UTC
Here's a standalone test case. This fails (returns 1 from main())
on armv5b-linux when compiled with gcc-3.4.4, 3.3.6, or 3.3.3,
at -O2 or -O1 -foptimize-sibling-calls. Disabling sibcall optimisation
hides the bug.

If the variable x in main() is removed (passing NULL not &x to
clobbers_callers_stack()), then the return address slot in main()'s
frame gets clobbered, causing main() to return to la-la land,
resulting in a seg fault.

#include <stdio.h>

void doit(void *p, unsigned long long ull, unsigned c, unsigned a, unsigned s)
{
    if (!(int)ull)
        printf("%p %016llx %x %x %x\n", p, ull, c, a, s);
}

void clobbers_callers_stack(void *p1, void *p2, unsigned c, unsigned long long ull)
{
    doit(NULL, ull, 0x55, 0x20, 0x3a3a3a3a);
}

int main(void)
{
    int x = 0;
    clobbers_callers_stack(&x, NULL, 8, 0x700ULL);
    if (x != 0) {
        printf("main: x == %#x\n", x);
        return 1;
    }
    return 0;
}
Comment 7 Andre 2005-11-09 23:32:57 UTC
(In reply to comment #4)
>
> I think this is a dup of bug 23150 which was fixed in 4.1.0.
> 

Something has certainly changed in 4.1 - the stack corruption is gone.
With -Os, the good() and bad() testcases compile to:

------------------------------------------------------------

bad:
	@ args = 8, pretend = 4, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	sub	sp, sp, #4
	str	lr, [sp, #-4]!
	mov	ip, #0
	sub	sp, sp, #8
	str	r3, [sp, #12]
	mov	r0, ip
	mov	r1, ip
	mov	r2, ip
	mov	r3, ip
	str	ip, [sp, #0]
	str	ip, [sp, #4]
	bl	foo
	add	sp, sp, #8
	ldr	lr, [sp], #4
	add	sp, sp, #4
	bx	lr

good:
	@ args = 4, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	str	lr, [sp, #-4]!
	mov	ip, #0
	sub	sp, sp, #8
	mov	r0, ip
	mov	r1, ip
	mov	r2, ip
	mov	r3, ip
	str	ip, [sp, #0]
	str	ip, [sp, #4]
	bl	foo
	add	sp, sp, #8
	ldmfd	sp!, {pc}

	.ident	"GCC: (GNU) 4.1.0 20051105 (experimental)"

------------------------------------------------------------

The bug with previous compilers seems to be that the amount of stack space already allocated by the caller was over estimated by the callee (ie incorrectly calculated to be 8 bytes instead of 4 - as if the entire long long param is passed on the stack when in fact only half of it has been).

However with 4.1, neither good() or bad() make any use of the 4 bytes of stack already allocated for them by their caller ??. They both assume they start off with 0 bytes allocated to them and then correctly allocate as required.

Therefore it seems that 4.1 is generating correct code because of a missing optimisation that was present in previous versions ??

------------------------------------------------------------

As an aside, if gcc were smart enough, I believe good() and bad() should compile to exactly the same assembler, so there's still some optimisation tweaks that could be done... ;-)

------------------------------------------------------------
Comment 8 Andrew Pinski 2005-11-09 23:41:56 UTC
(In reply to comment #7)
> (In reply to comment #4)
> >
> > I think this is a dup of bug 23150 which was fixed in 4.1.0.
> > 
> Something has certainly changed in 4.1 - the stack corruption is gone.

Yes then this is a dup of that bug then.  The problem is that the middle-end did not know what the target was doing so it rejected sib calling in this case.

*** This bug has been marked as a duplicate of 23150 ***
Comment 9 Andre 2005-11-10 01:25:08 UTC
(In reply to comment #8)
> 
> Yes then this is a dup of that bug then.  The problem is that the middle-end
> did not know what the target was doing so it rejected sib calling in this case.
> 

Any idea why the patch for bug 23150 doesn't fix 4.0.2 ??
(the patch applies OK without any manual editing).

What is the procedure for getting fixes in 3.4.x and 4.0.x ??
Or should users just wait until 4.1 is released ??
Comment 10 Bill Thompson 2005-11-11 00:56:27 UTC
Subject: Re:  Stack corruption in ARM arch. if 64bit variable is passed to a function of which the low 32 use the register and the up 32 use the stack

> What is the procedure for getting fixes in 3.4.x and 4.0.x ??
> Or should users just wait until 4.1 is released ??

I am new to the gcc bug fixing process. For getting this fixed in 3.4.4,
should I log this bug against gcc 3.4.4?