This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: ix86 `double' alignment (was Re: egcs-1.1 release schedule)


>  > >	  If the stack gets mis-aligned relative to STACK_BOUNDARY
>  > >	  combine could end up removing a seemingly useless
>  > >	  stack operation/address calculation.
>  > 
>  > I don't understand this, but presumably I need to look into it
>  > further.
>I explained it a little in a message to Toon.  Basically combine knows
>have to remove a rundant "and" operation which just turns off some
>low order bits in an address.  If the stack isn't aligned to
>STACK_BOUNDARY, then combine could end up removing a mask operation
>that wasn't redundant.

I'm a little curious, though, how such an operation comes to pass.
Is it only likely because the user code does something like
"&foo & 7", or are there internally-generated reasons?  It's okay
if you can't think of any examples; I agree with the overall
sentiment that we don't want to lie to the compiler in this area,
even if we can't come up with a reason it'd bite us right away!

>  > Okay, that makes sense to me.  We want to hit a majority of cases
>  > anyway.  We don't care (for now) about cases where users are
>  > combining multiple languages in weird ways, for example.
>Well, we care about it from a correctness standpoint.  Things still
>have to work if they're combining .o files from old compilers,
>callback from the library like qsort, etc.  But we aren't really
>worred about performance in those cases.

I came up with worse examples.  If STACK_BOUNDARY (or anything that
might break the ABI) is adjusted based on whether the processor
is [56]86 vs. [34]86, then code/libraries that happen to be compiled
on different variants of the x86 architecture could be magically
incompatible, producing subtly wrong results.

The other bad one is if we made only the g77 compiler "break" the
ABI to get this performance the "easy way", then some poor user
believed the g77 docs about f2c compatibility and tried to link
f2c-and-gcc-compiled code with g77-compiled code.  Even on the
same machine with the same version of gcc/g77 (egcs 1.1, if we
went down this rat-hole :), the result would be a subtly broken
executable, because the g77-compiled code would lay out its
COMMON areas differently than the f2c-and-gcc-compiled code!

>Sure.  Think about cases where the alignment of the double in the
>arglist isn't naturally aligned (think C, pass by value :-).
>
>foo (double, int)
>
>we push args back to front.
>
>So we push the int on the stack, which means the double will be
>at only a 4 byte aligned stack address if we assume our stack
>was 8 byte aligned before we pushed the args.

Note that, in practice, what g77 most often does is:

foo (double *, int *)

But, aside from that...

My question is, just why, *conceptually*, is it a problem on the
x86 architecture to try to align the argument list so the caller
frame is 64-bit aligned *and* at least some of the doubles in
the list are 64-bit aligned, but some aren't?

That is, is there a reason that x86 code *must* be generated either
to always assume doubles are 32-bit aligned *or* are always
64-bit aligned?  I can't think of any.

If that's the case, then IMO this whole problem is indeed, as I
thought, the result of gcc just not having a flexible-enough
architecture, that is, its "housekeeping staff" can't cope with
meeting these meetable requirements.

So can we at least come up with a short-term way to say "*try*
to align outgoing doubles to 64-bits, but don't assume incoming
doubles are 64-bit aligned", and in the long run make a better
overall architecture for representing alignments?  (Do I need
to write a "white paper" on what I mean by all this -- would
that help anyone understand what I'm talking about?  I've thought
it through quite a bit lately, so I could probably bang it out
with a few days' work.)

>You might think we could compensate for this by pushing an extra 
>dummy word before the first integer to ensure the double gets 
>aligned.  But that loses if we have:
>
>foo (int2, double, int1)
>
>If we pushed an extra 4 byte hunk before int1, then the total 
>size of the arglist would be 20 bytes -- not a multiple of 8.
>
>And as I'll explain below, we must always make sure to allocate
>in 8 byte lumps -- we can't depend on the callee to round the stack.

Again, what is the *real* problem with just doing what is currently
done for that case, ending up with a misaligned double arg for
the incoming procedure -- must it really assume its double is
64-bit aligned?  Or is this really just an internal problem with
gcc's housekeeping?

(Guess I should start reading my 486 handbook again!  :)

>  > Is it reasonable
>  > to just subtract an extra 8 bytes when creating the frame
>  > pointer upon procedure entry and then NAND it with 7 to align
>  > it?  Or would that make for problems with debugger, profiling,
>  > and/or exception support, or is there no quick way to NAND the
>  > frame pointer on the x86?
>Nope.  Because you then don't have a constant offset to get to the
>arguments that were passed to the function.    To make this work
>you'd have to dedicate a hard register to serve as an argument
>pointer, which will be horrible.
>
>[ Think about it, how can you generate code to find an argument if
>  at entry to the procedure you may adjust the stack by a varying
>  value (0 or 4). ]

Duh, okay, that's right.  That's my limited SPARC/VLIW thinking
tripping me up again.  Even though they have the same problem, I
don't think about it, because so often all the incoming arguments
arrive in registers.  (I have spent most of the last 13 years or
so working on machines that have registers; I'd forgotten my
earlier experiences working on machines with hardly any, sorry.  :)

>Instead we must make sure that we always allocate stacks in 8 byte
>hunks in the prologue *and* that we push an extra dummy word on the stack
>when performing function calls where the arg list + return pointer
>are not a multiple of 8 bytes in size.
>
>[ Remember, the call itself will push a 4 byte word on the stack
>  too, so we have to account for it too. ]

Right.  Okay.

>  > It seems like everyone else thinks the right way to do this is
>  > to try to always assure %sp is 64-bit aligned across calls by
>  > modifying all the code that is in the procedure-call chain.
>  > That probably means an extra dummy push before odd-number-of-args
>  > calls, etc., right?
>Close.  It's not the number of args, but the total size of the arg
>list.  If the size of the arg list is a multiple of 8 bytes, then
>we have to push a dummy arg. so that in the stack is 8 byte aligned
>when we enter the callee.

Well, if we can arrange for internal gcc housekeeping to do this
by default, *without* having other gcc housekeeping assume that
incoming double arguments, or the stack frame itself, are aligned,
is that basically enough to cover what I've been asking for?

(Note that, ideally, -malign-double would not be needed to do the
above.  I wouldn't mind a new option to disable that new behavior,
but IMO it should be enabled by default.)

Also, presumably we don't actually have to *push* an arg, but just
subtract 4 from %esp, right?

I am quite willing to do this work myself.  But I say that well-
knowing I'm not the best person for the job; just someone sufficiently
enthused, with a spot of time, a Pentium II, a trackball, and
half the g77 user base hounding me for the past couple of years,
etc. etc. etc.  So I'd need some initial hand-holding, probably.  :)

        tq vm, (burley)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]