This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] sibling-call optimization violates PPC ABI on darwin


Apple recently had a problem with the sibling call optimization
violating the Darwin PPC calling convention.

The Darwin calling convention for PPC is incompletely documented in
"Mach-O Runtime Conventions for PowerPC," and I believe this document
is installed on OS X with the Developer Tools.

On Darwin, the PPC stack is kept 16-byte aligned for AltiVec
compatibility. Curiously, the PPC "parameter area" is also padded to
be a multiple of 16 (see function.c:assign_parms(), near line 5169).
I don't understand why; it seems to me that the rounding logic in
rs6000.c:rs6000_stack_info() would be sufficient.

The odd result of this padding is that the PPC parameter area is often
larger than it needs to be, and in turn, causes bad code generation
when the sibling-call optimization is enabled.

ralph (i1, i2, i3, i4, i5, i6, i7, i8, i9)
int i1, i2, i3, i4, i5, i6, i7, i8, i9 ;
/* Note tenth argument. */
velma (i1, i2, i3, i4, i5, i6, i7, i8, i9, 42);

main ()
ralph (1, 2, 3, 4, 5, 6, 7, 8, 9);

In this example, main is obligated to allocate a 36-byte parameter
area. function.c:assign_parms() will round this up to 48 bytes.

Inside the body of ralph(), the sibling-call optimizer will look at
the call to velma(), and calculate that it needs 40 bytes of parameter
area. Since main() allocated 48 bytes for us, the sibling call
optimization will proceed, a store of the tenth parameter value ("42")
into the stack will be generated, and everything works.

...Until somebody else compiles "main()". In the OS X environment,
many of our customers are using the MetroWorks CodeWarrior(tm)
compiler, and it doesn't pad the parameter area to a multiple of 16
bytes. If CodeWarrior compiled main(), the sibling-call optimized
body of ralph() will clobber the main() stackframe. (CodeWarrior
does keep the stack 16-byte aligned.)

This isn't always a problem. If the "42" above became "42.0", it
would be passed in a floating-point register with no store into the
stack. This violates the calling convention, but it will usually work
in practice.

For the record, GCC2 and GCC3 both pad the parameter area, but GCC2
apparently didn't have the sibling-call optimization. Accordingly,
GCC2 and GCC3 compiled code will interoperate without surprises.
CodeWarrior doesn't pad parameter areas, and it's complying with the
calling convention as I understand it.

How should we fix this? Possible fixes include:

1) Recording the unrounded, true size of the parameter area, and
using this unrounded size in the sibling-call optimizer.

2) Stop rounding up the parameter area size.

3) Turning off the sibling-call optimizer. (Not really an option :-)

Attached below is a patch that implements #1 above. I added this
to Apples' compiler late one night while under significant time pressure,
and I am *not* claiming this is the Right Solution. :-)

FWIW, Dale would prefer #2 above. Dale is on vacation right now, but
that needn't stop us from fixing this.

stuart hastings
Apple Computer

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]