This is the mail archive of the
mailing list for the GCC project.
Re: x86: -Os -msse2 needs -maccumulate-outgoing-args
- From: Stuart Hastings <stuart at apple dot com>
- To: Jan Hubicka <hubicka at ucw dot cz>
- Cc: gcc at gcc dot gnu dot org, Falk Hueffner <falk dot hueffner at student dot uni-tuebingen dot de>, Richard Henderson <rth at redhat dot com>
- Date: Wed, 14 Jan 2004 14:52:15 -0800
- Subject: Re: x86: -Os -msse2 needs -maccumulate-outgoing-args
- References: <1F5C0B7C-2D05-11D8-8396-000A95A83B3C@apple.com> <20031218222508.GE10588@redhat.com> <FB5CAAB7-40B3-11D8-84FC-000A95A83B3C@apple.com> <20040107101305.GC21007@redhat.com> <E5C6E870-4161-11D8-84FC-000A95A83B3C@apple.com> <20040107224525.GB25232@redhat.com> <D38B6320-41FD-11D8-B2D3-000A95A83B3C@apple.com> <AF4DC065-4609-11D8-B14F-000A95A83B3C@apple.com> <20040113205102.GC19340@atrey.karlin.mff.cuni.cz> <6B70DE88-462C-11D8-B14F-000A95A83B3C@apple.com> <20040114010351.GA13317@atrey.karlin.mff.cuni.cz>
On Jan 13, 2004, at 5:03 PM, Jan Hubicka wrote:
On Jan 13, 2004, at 12:51 PM, Jan Hubicka wrote:
this is not -maccumulate-outgoing-args that breaks, it is
-mpreffered-stack-boundary=2 implied by -Os. We still don't have the
dynamic stack alignment code merged in, unfortunately.
I looked at this, and it's not clear to me how it would help. Here's
Also, given the unique semantics of the Darwin linker, I'm not sure
this could work on Darwin today (our linker "knows" too much about
entry points). I think this would be implementable with a future
Yes, it is tricky, but it would allow us to simply align frame when we
know we need it aligned.
The Darwin linker supports a scheme called "scatter loading," where
functions will be linked in the order given in a file supplied by the
user. Since the Mach object format has no markers to say "function foo
begins here" and "function foo ends there," the linker assumes that
global labels in TEXT sections mark the beginning of functions, and
implicitly, the ends of other functions.
If the multiple entry points in this scheme are globally visible, the
Darwin linker will cut them apart when asked to scatter-load the
enclosing program. :-(
Fortunately, work is underway to fix this in the Darwin linker...
It would be probably sane to give some warning or hard error when SSE
register is put onto stack, but I don't see easy way to do this, as
should not output such a warning for long doubles that have same
I'm sorry, I didn't fully understand this.
If a long double has the same alignment properties as a vector, then
misaligned loads of that long double should fault just like a
misaligned vector load (?). If the long double prefers a high
No, i386 is trickier. Originally it was designed as architecture that
allow missaligned memory accesses everywhere and just recently this
broke with SSE. Everything earlier (even MMX) allows missaligned
loads/stores, so we don't lose when at -Os we don't align stack frame
16 byte boundary.
Hm. What happens if -mfpmath=sse, and we try to load a 64-bit double
with 32-bit alignment into an XMM register ?
alignment, but will work with lesser alignment, then its alignment
requirements aren't the same as a vector (??).
Here's another approach:
If main() is part of a module compiled with
-mpreferred-stack-boundary=2, and main() wants to pass a vector
argument to function foo_v() in another module, main() knows that A)
main()s own stackframe is not aligned, and B) since foo_v() is
expecting a vector parameter, foo_v() must have been compiled with a
suitable stack alignment. (O.K., it's a weak argument.)
Instead of generating a warning, howabout forcing the stack into
alignment just for the call to foo_v() ? Sort of a per-call variant
The problem with this is that function don't need to have SSE vector
argument and still can use SSE internally, so this scheme would miss
Well, if noparm_v() has vector local variables, we know when it is
compiled if it has sufficient stack alignment or not, and we can tell
the user (probably a hard error).
If main() has 32-bit stack alignment... shucks. If main() has no
vector parameters, but wants to use vectors internally, it will work so
long as main() declares no vectors on its local stackframe (only
statics and globals). But that won't stop GCC from creating vector
temporaries on main()s local stackframe (e.g.
If the user compiles module A.c with align=2, and B.c with
align=vector, and there is a call from module A into a vector-infected
function in B, it will break, the only "fix" would be a link-time
check. I guess we have to let this one break, silently.
Outputting such hard errors when compiling function that accepts
To insure safety, any function that accepts vector arguments would
insist upon a suitable stack alignment at compile time (hard error).
can be implementable and can tell user that -Os without
-mpreferred-stack-boundary is no-go, so perhaps this is pretty good
idea. WHat other thinks?
Since any function that uses any 128-bit vector expression could spawn
a vector temporary, every such function needs the vector stack
alignment, and should insist upon it with an error.
Since MMX registers (64-bit vectors) will work with weaker alignment, I
suppose a warning ("weak stack alignment") would be in order.
The logical conclusion is that my initial suggestion to align the stack
for one call is moot. I can envision a "safe mode" where a function
could realign its own stackframe, and be able to dereference possibly
misaligned parameters, but it's only use would be to find bugs that
would be resolved if the correct alignment was used everywhere. Not
Suggestion: the testcase I offered should provoke an error, saying
"vector codes compiled with -Os need -mpreffered-stack-boundary=7" or
whatever. In any case, the current practice of silently generating bad
code when vector.c is optimized with -Os is clearly unacceptable.
Would a patch with this approach be acceptable ? Any suggestions where
the checking should be done ?
The problem I see here is what to do when a function accepts variable
Ack! I just suggested an additional check to calls.c; it is already
much too complicated. :-(
You can do such tricks in init_cumulative_args code.