This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Draft "Unsafe fp optimizations" project description.



----- Original Message -----
From: <dewar@gnat.com>
To: <aj@suse.de>; <dewar@gnat.com>; <tprince@computer.org>
Cc: <gcc@gcc.gnu.org>; <lucier@math.purdue.edu>;
<toon@moene.indiv.nluug.nl>
Sent: Sunday, August 05, 2001 8:59 AM
Subject: Re: Draft "Unsafe fp optimizations" project
description.


> <<I don't think the abrupt underflow settings which Itanium
and
> P4 architectures depend on for good performance qualify as
> "denormal orthodoxy."  Do the literature treat the
combination
> of abrupt underflow with extended exponent range?
> >>
>
> At least one paper I read way back did discuss this issue,
but I can't
> remember where I found it (that was back when I was writing
my book,
> by the way, I still think that chapter 5 of that book,
Microprocessors
> A Programmer's View, is a nice simple intro to some of the
intracacies of
> IEEE, and in particular it discusses why denormals are so
important.
>
> I think it is overstrong to say that the Itanium and P4
architectures
> depend on abrupt unberflow for good performance, do you have
figures to
> back up this claim.
>
> Please do not assume that the presence of these options
means they are
> useful necessarily (if you thought that, you might even end
up using the
> ENTER instruction, which was a bad idea even way back on the
386 :-)
>
> Nothing substitutes for measurements when it comes to
arguing the
> actual value of optimziations!
The fatigue benchmark from www.polyhedron.com generates enough
underflow events to make the point on Itanium.  Certainly, it
would be more on topic (and possible to quote results without
violating an NDA) if there were a suitable gnu compiler. There
may be a number of variables to consider; various compilers
make different choices about the use of extended precision,
which may affect the conclusion.

P4 has the strange characteristic of leaving us stuck with
those "x87 assists" for de-normalized load and store (someone
mentioned kernel traps earlier in this thread), unless SSE
code is used, for which there are abrupt underflow options.
Cross products and cosines producing de-normals have been
giving me fits lately.   I may be able to try a translation of
fatigue with g77 -msse2 on linux, which would enable me to get
a useful comparison between abrupt and gradual underflow
settings.  One way or another, I'll get some data on something
other than a customer's app, even if I have to do it without
crossing Bill G.   I do have a customer benchmark which
speeded up 50% on P4 simply as a result of arranging to reduce
the number of x87 assists, but your point is taken, that
something is needed which others can make use of.

Are you suggesting that abrupt underflow, on systems which
have such an option, should be a -ffast-math option?
According to my understanding, up to now, gcc has ignored this
issue and left it to the support library, if the library
maintainers cared to do anything with it.  I don't think the
support library will be using compiler switches to make such
decisions, although I've just been working with one which does
just that; run time library floating point behavior set by
options used when compiling main().  It's a wonder that mixed
language builds can ever be made to work.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]