This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)



  In message <199806221811.OAA07410@melange.gnu.org>you write:
  > For Fortran code, we can usually hand-wave that; this case would
  > only come up when the call tree has an *embedded* procedure
  > that doesn't maintain proper alignment, and since the big
  > computational problem with g77 performance is in code compiled
  > by g77, and such code is rarely called by C code, I don't think
  > this would represent a huge deficiency.
Right, but changing STACK_BOUNDARY is not an option because it does
effect C code.


  > >	  If the stack gets mis-aligned relative to STACK_BOUNDARY
  > >	  combine could end up removing a seemingly useless
  > >	  stack operation/address calculation.
  > 
  > I don't understand this, but presumably I need to look into it
  > further.
I explained it a little in a message to Toon.  Basically combine knows
have to remove a rundant "and" operation which just turns off some
low order bits in an address.  If the stack isn't aligned to
STACK_BOUNDARY, then combine could end up removing a mask operation
that wasn't redundant.



  > Okay, that makes sense to me.  We want to hit a majority of cases
  > anyway.  We don't care (for now) about cases where users are
  > combining multiple languages in weird ways, for example.
Well, we care about it from a correctness standpoint.  Things still
have to work if they're combining .o files from old compilers,
callback from the library like qsort, etc.  But we aren't really
worred about performance in those cases.

  > 
  > >	* The ABI is still going to mandate that some doubles in
  > >	  argument lists are going to be mis-aligned.  We'd have
  > >	  to arrange to copy them from the arglist into a suitable
  > >	  stack slot.  This may be more trouble than its worth.
  > 
  > I'm not sure how this can ever happen in the x86 architecture?
Sure.  Think about cases where the alignment of the double in the
arglist isn't naturally aligned (think C, pass by value :-).

foo (double, int)

we push args back to front.

So we push the int on the stack, which means the double will be
at only a 4 byte aligned stack address if we assume our stack
was 8 byte aligned before we pushed the args.

You might think we could compensate for this by pushing an extra 
dummy word before the first integer to ensure the double gets 
aligned.  But that loses if we have:

foo (int2, double, int1)

If we pushed an extra 4 byte hunk before int1, then the total 
size of the arglist would be 20 bytes -- not a multiple of 8.

And as I'll explain below, we must always make sure to allocate
in 8 byte lumps -- we can't depend on the callee to round the stack.


  > Is it reasonable
  > to just subtract an extra 8 bytes when creating the frame
  > pointer upon procedure entry and then NAND it with 7 to align
  > it?  Or would that make for problems with debugger, profiling,
  > and/or exception support, or is there no quick way to NAND the
  > frame pointer on the x86?
Nope.  Because you then don't have a constant offset to get to the
arguments that were passed to the function.    To make this work
you'd have to dedicate a hard register to serve as an argument
pointer, which will be horrible.

[ Think about it, how can you generate code to find an argument if
  at entry to the procedure you may adjust the stack by a varying
  value (0 or 4). ]

Instead we must make sure that we always allocate stacks in 8 byte
hunks in the prologue *and* that we push an extra dummy word on the stack
when performing function calls where the arg list + return pointer
are not a multiple of 8 bytes in size.

[ Remember, the call itself will push a 4 byte word on the stack
  too, so we have to account for it too. ]


  > It seems like everyone else thinks the right way to do this is
  > to try to always assure %sp is 64-bit aligned across calls by
  > modifying all the code that is in the procedure-call chain.
  > That probably means an extra dummy push before odd-number-of-args
  > calls, etc., right?
Close.  It's not the number of args, but the total size of the arg
list.  If the size of the arg list is a multiple of 8 bytes, then
we have to push a dummy arg. so that in the stack is 8 byte aligned
when we enter the callee.


jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]