This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: x86 double alignment (was egcs-1.1 release schedule)


>On Thu, Jun 25, 1998 at 12:15:17PM -0400, Craig Burley wrote:
>> >The original patch turned on -mstack-align-double, which I thought is safe,
>> >but it isn't. I got a report from a windows user that it breaks most windows
>> >function semantics, as these functions deallocate the stack themselves. In
>> >this case, -mstack-align-double will break the program.
>> 
>> Wait, how can these functions deallocate the stack themselves when
>> the *caller* also is deallocating the stack, as is normally the
>
>it isn't: __attribute__((stdcall)).

So you are saying gcc knows when it calls a function that the function
will do the deallocating?  (I'm not sure I fully understand your
response, since you didn't explain it particularly carefully.  Is
that attribute on a function definition or declaration, for example?)

In that case, why isn't it simply a bug that -mstack-align-double isn't
handling that attribute correctly, and a (fairly) easily fixable bug
at that?

>> case for x86-ABI code?  Aren't these functions essentially violating
>> the ABI in a way the compiler producing code that calls them *must*
>> know about?
>
>the problem is we can't change the windows kernel. its a third-party
>product I'd really like to recompile ;)

I meant what apparently you said about attribute(stdcall) or
whatever -- that, if gcc compiles code that calls a non-ABI-
obeying function, it *must* be told about that.

In other words, there's really no reason -mstack-align-double should
have any problems vis-a-vis ABI conformance.  If it does, that's
just a bug that needs fixing.  E.g. for a function with the
attribute saying "this is a Windows function that breaks the ABI",
it simply doesn't attempt to aligning the outgoing stack frame, or,
upon returning, it pseudo-pops the pseudo-pushed 32-bit word.

>> >- document that -mno-stack-align-double should be used when linking against third-party-libs
>> >- not making it on by default.
>> 
>> If it's not on by default, it's of only limited benefit, though that'd
>> still take care of lots of Fortran users who are willing to learn about
>> and use an option like -mstack-align-double.
>
>I had this on by default for half a year in pgcc (until I was told about
>windows having problems). It isn't beneficial for integer programs, and
>there are a great many of them around. I'm not happy with the idea
>of punishing a whole class of functions.

Neither am I.  But how much does integer stuff slow down?  2% on
average?  5% max?  Compare that with double-precision speed-ups,
which I've been hearing are on the order of in the tens of
percents average, as much as 200 or 300%.  I still think it's a bit
of a tough call, even given that, but others might well disagree
and say it's a clear win for always doing alignment.  That's the
direction in which I lean as well.

>If all third-party libraries (e.g. libc) were compiled with this switch (we
>only need to have this on for functions using callbacks), we could leave the
>choice up to the individual program.

Ideally, we'd just leave it up to link-time optimizations, which are
(nearly) ideal for properly handling this sort of thing.  (Of course,
that assumes object files don't contain such nailed-down assembly
code.)  But that's a *huge* project, and in the meantime, we're
getting killed by compilers that do the right thing without needing
all sorts of special options, from what I hear.  (Can anyone confirm
this?)

>The problem is mostly speed loss.

How much?  Especially, how much *peak* loss?  IMO, if there are any
at-least-moderately-widely-used codes out there that'd suffer
more than a 5% slowdown, then this should be taken very seriously,
in terms of whether we want to impose that, by default, on users
of egcs 1.1.

>> But, the problem is that if we don't make -mstack-align-double the
>> default, lots of code that uses `double' will not get proper
>> alignment and will continue have *big* performance degradation.
>
>Educate your users. I thought that way for a great long time, maybe I'm
>used to tweaking switches, but thats _already_ the case with loop unrolling
>or "-O3 vs. -O2". Why isn't loop unrolling on by default? Why Do most
>programs use -O2 when -O3 is faster?

Gee, thanks.  We've spent a couple of years or so already trying
to teach them when and how to use -malign-double, and now you're
saying we should spend even more time and energy teaching them how
and when to use -mstack-align-double (though that's greatly eased
by it not breaking the ABI).

In the meantime, the "education" misses a huge part of the user
base no matter what we do, as we've learned (and expected anyway)
over the past several years.

Why not just educate *your* users that if they want that last 2 or 3%
of performance and they're using no double-precision data, and not
calling any procedures (e.g. user-supplied) that might, they should
use -mno-stack-align-double?

After all, who do you think is more likely to scrutinize the
documentation -- people who *really need* that last 2 or 3%,
or people who *might* notice (or not, and just explain away as
"slow") a 2x or 3x drop in performance compared to *other*
x86 compilers out there?  In particular, if they *don't* notice
a 2-3% slowdown, then they don't care; if they *do*, they'll
probably know to read the docs.  But users of gcc have hardly
ever seen even adequate double-precision performance (unless
they happened to use one or two now-obsolete versions of g77,
with the dangerous stack-alignment patch), so they are less
likely to think the compiler has "gotten" slower, more likely
to think it just "is" slow.

AFAIK, nobody has ever claimed g77 is "slow" compared to other
compilers because it produces code that is 2-3% slower than a
competitor.  But they often do when that drop is greater than 10%,
and too often without reading the docs to see how they might get
better performance.

Apparently that's because other vendors often make the "right"
choices about numerical performance (though it'd be worthwhile
to research just how some x86 Fortran vendors handle the
64-bit-alignment problem vis-a-vis "alien" code), so users of
those vendors' compilers don't have to study long lists of
possibly useful options to get remotely in the ballpark of
decent performance.

And apparently many users doing benchmarks consider that a
quality issue -- how good performance is using *vanilla*
options like -O, without having to read tons of docs (especially
in the case of gcc/g77, which, as portable products, have
tons of machine-specific optimization options).

Whereas, that seems to be what you're asking gcc/g77/g++/GNAT
users to do -- to get double-precision performance that's even
within 2x or 3x of performance (even with *moderate* optimization),
they have to learn about and compile with -mstack-align-double
throughout their program.  That's not even a portable option
(across machines gcc supports).

As far as why loop unrolling and other things are the way they are,
there's a variety of reasons.  One is that sometimes the code is
considered insufficiently tested to enable for the default
optimization level (-O1).  Another is that it can greatly inflate
some types of code and/or data structures.  Another is that it can
greatly increase the amount of time the compiler takes to compile
some types of code.  Another is that it can actually slow down certain
types of code, sometimes greatly, maybe even *most* types of code.

Without having info on how much slower integer codes really get,
I'm assuming the only pertinent reason that applies to making
-mstack-align-double the default is the first -- that the code
is insufficiently tested.

However, that's mitigated by the fact that this isn't an optimization
"pass" in the usual sense -- in fact, it's got to be used *widely* to
even begin to see any benefits from it, so if it isn't enabled
by default, IMO it probably won't get used by more than 1% of the
entire user base.  Hence, it won't be tested, and gcc will continue
to appear to be a poor performer.  (Most people who'd use it will
just use -malign-double to get "all" the performance, is my guess.)

In fact, without having certain important libraries compiled with
this option enabled by default, the *entire* exercise is pointless
for about 99% of the users who *would* see big performance
improvements if we made it the default.

That's why, IMO, -mstack-align-double should be the default *period*,
including for unoptimized code (-O0).

I'd love to see gcc taught to effectively avoid doing this stuff in
cases where it's clear no double-precision arithmetic is involved,
though.  E.g. leaf functions with no DP vars, and functions that
have no DP vars and call only leaf functions of that sort or of
their own sort.  That might restore much of the apparent lossage
from making -mstack-align-double the default.

Another approach is to make -mno-stack-align-double the default
and teach gcc to special-case any function *not* meeting the
above requirements (e.g. functions with DP vars, or that might
call those with them) by somehow forcibly aligning the incoming
stack frame (if local DP vars exist).  That'd require conditionally
moving the incoming args collectively by 4 bytes (if the frame
is unaligned), and dealing with the fallout of this (it might affect
profiling, exceptions, debugging, etc.).

But either of the above approaches is going to take quite a bit
more time, well beyond the 1.1 timeframe.

So the question for 1.1 is: assuming we *can* make -mstack-align-double
the default (and my quick tests of it didn't seem too promising,
but then I realized some of that might be due to the g77 bugs I
think I've finally found and fixed as of a couple of hours ago),
*should* we for 1.1?  Should we default to as much as 300% speedups
in double-precision code and as much as 2-5% (?) slowdowns in
non-DP code, or continue "straining" to get the last few percentages
points for non-DP code at the *huge* (and embarrassing) expense of
DP code?

        tq vm, (burley)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]