This is the mail archive of the
mailing list for the GCC project.
Re: Draft "Unsafe fp optimizations" project description.
- To: <dewar at gnat dot com>, <aj at suse dot de>
- Subject: Re: Draft "Unsafe fp optimizations" project description.
- From: "Tim Prince" <tprince at computer dot org>
- Date: Sun, 5 Aug 2001 23:05:45 -0700
- Cc: <gcc at gcc dot gnu dot org>, <lucier at math dot purdue dot edu>, <toon at moene dot indiv dot nluug dot nl>
- References: <20010805155952.9F1A1F2B78@nile.gnat.com>
----- Original Message -----
To: <firstname.lastname@example.org>; <email@example.com>; <firstname.lastname@example.org>
Cc: <email@example.com>; <firstname.lastname@example.org>;
Sent: Sunday, August 05, 2001 8:59 AM
Subject: Re: Draft "Unsafe fp optimizations" project
> <<I don't think the abrupt underflow settings which Itanium
> P4 architectures depend on for good performance qualify as
> "denormal orthodoxy." Do the literature treat the
> of abrupt underflow with extended exponent range?
> At least one paper I read way back did discuss this issue,
but I can't
> remember where I found it (that was back when I was writing
> by the way, I still think that chapter 5 of that book,
> A Programmer's View, is a nice simple intro to some of the
> IEEE, and in particular it discusses why denormals are so
> I think it is overstrong to say that the Itanium and P4
> depend on abrupt unberflow for good performance, do you have
> back up this claim.
> Please do not assume that the presence of these options
means they are
> useful necessarily (if you thought that, you might even end
up using the
> ENTER instruction, which was a bad idea even way back on the
> Nothing substitutes for measurements when it comes to
> actual value of optimziations!
The fatigue benchmark from www.polyhedron.com generates enough
underflow events to make the point on Itanium. Certainly, it
would be more on topic (and possible to quote results without
violating an NDA) if there were a suitable gnu compiler. There
may be a number of variables to consider; various compilers
make different choices about the use of extended precision,
which may affect the conclusion.
P4 has the strange characteristic of leaving us stuck with
those "x87 assists" for de-normalized load and store (someone
mentioned kernel traps earlier in this thread), unless SSE
code is used, for which there are abrupt underflow options.
Cross products and cosines producing de-normals have been
giving me fits lately. I may be able to try a translation of
fatigue with g77 -msse2 on linux, which would enable me to get
a useful comparison between abrupt and gradual underflow
settings. One way or another, I'll get some data on something
other than a customer's app, even if I have to do it without
crossing Bill G. I do have a customer benchmark which
speeded up 50% on P4 simply as a result of arranging to reduce
the number of x87 assists, but your point is taken, that
something is needed which others can make use of.
Are you suggesting that abrupt underflow, on systems which
have such an option, should be a -ffast-math option?
According to my understanding, up to now, gcc has ignored this
issue and left it to the support library, if the library
maintainers cared to do anything with it. I don't think the
support library will be using compiler switches to make such
decisions, although I've just been working with one which does
just that; run time library floating point behavior set by
options used when compiling main(). It's a wonder that mixed
language builds can ever be made to work.