[RFC] Fix PR28684
Clint Whaley
whaley@cs.utsa.edu
Tue Nov 14 21:51:00 GMT 2006
Guys,
Sorry for the delay in responding the ongoing discussion, I'm pretty snowed
under at the moment. However, let me take a minute to respond to the questions
that seem to have been directed my way.
Most importantly, my request had to do with IEEE compliance, not FLOP count.
Vectorization can often increase the FLOP count (for example
by performancing redundant computation or by the need to add the vectors up
at the end of a loop due to scalar expansion in the vectorization of
accumulators).
My problem is that -funsafe-math says it may violate IEEE compliance, for
instance by using 3DNow!, which instead of handling overflow and underflow
(and gradual overflow), just uses saturating instructions. Therefore, at
the end of the computation, instead of, for instance, finding an infinity
in the answer (telling you there's an error), you find a normal value instead.
So, you think your spacecraft survived reentry, rather than broke apart,
and everything is golden until the entire crew dies because the simulation
went screwy with no way to find that out (or the X-ray being analyzed comes
back with "no tumor", etc) . . .
At the same time, modern x86 architectures are stressing the SSE unit over
the x87 unit. Vectorized code has a 4 (8 single precision) computational
advantage over x87 code, and I think a 2 (4) advantage over scalar SSE on
the new Core2Duo. AMD's planned arch will have have a 2 (4) advantage over
the x87 or scalar SSE as well. Therefore, numerical programmers absolutely
need to be able to utilize vectorization, or they give away as much as a
factor of 8 in performance. I believe IBM is similarly doing more and more
vectorization with their various PowerPC systems that have a variety of
vector units affixed to them. So, vectorization is absolutely required
on modern systems.
However, we also need to be sure that the arithmetic is IEEE compliant.
In grouping vectorization with non-IEEE-compliant arithmetic, you do indeed
make it so that numerical people cannot safely use the operation, as discussed
in my original report and outlined here. IEEE is the only real definition
of what a FLOP is, and therefore the only way to bound the error, and find
things like exceptional conditions. Since vectorization is particularly needed
today, and an absolute must tomorrow, one idea is therefore to have a flag that
allows vectorization only.
There are a lot of optimizations that are presently in unsafe that do not
break IEEE compliance, but do things like re-order. I think someone then
suggested that instead of just allowing vectorizations, why not allow
all the reordering operations? I'm certainly OK with this. The killer
for me (and pretty much every numerical computation that simulates something
real, as opposed to game physics, etc, where disasters don't cost $ and lives),
is using non-IEEE arithmetic.
As for how much info the user needs, the more I get, the more I can use a
particular flag. If a flag tells me it may reorder flops and reciprocate
divisions, I can examine my code and certify it accurate in the face of that,
and throw the flag for code that can take it (99% of my library). Any flag
which has the possibility of doing non-IEEE arithmetic is a non-starter
for serious numerical work.
I don't believe icc violates the IEEE standard w/o a flag (reorder: definitely).
In the past, some machines did because there was no support for particular
parts of the IEEE standard, but people knew what were the missing parts that
they could count on not having due to hardware (my memory may be bad, but I
think early DECs and maybe SPARCs didn't handle gradual underflow right, but
just rounded to zero, for instance).
So, a special flag for vectorization fills my needs, as does a more general
flag that allows all reorderings (including those necessary to vectorization).
I think most numerical users will be OK with this grouping. Grouping non-IEEE
in is what makes the flag unusable.
Thanks,
Clint
>From roger@eyesopen.com Tue Nov 14 10:42:41 2006
Return-Path: <roger@eyesopen.com>
X-Original-To: whaley@cs.utsa.edu
Delivered-To: whaley@cs.utsa.edu
Date: Tue, 14 Nov 2006 08:55:55 -0700 (MST)
From: Roger Sayle <roger@eyesopen.com>
To: Revital1 Eres <ERES@il.ibm.com>
cc: Ayal Zaks <ZAKS@il.ibm.com>, Dorit Nuzman <DORIT@il.ibm.com>,
<gcc-patches@gcc.gnu.org>, Richard Guenther <rguenther@suse.de>,
"R. Clint Whaley" <whaley@cs.utsa.edu>
Subject: Re: [RFC] Fix PR28684
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on mail0.cs.utsa.edu
X-Spam-Level:
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham
version=3.0.2
Status: RO
On Tue, 14 Nov 2006, Revital1 Eres wrote:
> > Perhaps pragmatically, reciprocal-math-optimizations can be those
> > that strength reduce divisions into (an equal number of)
> > multiplications.
>
> but x / y into x * (1/y) seems to not fit it which might hurt
> targets which support reciprocal but not division operations (i.e.
> Altivec).
>
> I still prefer the former definition... but that just me :-)
I'm trying to constructively work towards some kind of resolution.
To me the fundamental issue is why not use "-ffast-math"?
Some of the arguments have focused on "because I don't understand which
transformations that enables". The ignorance of users is not normally
a great motivation for change. If someone can't come up with a concrete
example of a problematic transformation, then there's probably no reason
not to trust GCC's optimizers.
Clint Whaley's bug report, PR middle-end/28684, survived not being
immediately closed a "not-a-bug/wont-fix" because if provided an
interesting use case. When benchmarking hardware on numerical kernels,
it's useful to establish an Mflop number, which is the number of
floating point operations. This allows several forms of associativity,
but not the full range of -ffast-math. This is a well-defined problem,
useful to a small group of specialists, and it seems reasonable to
support it.
Your interest from the point of view of vectorization is unrelated,
and not clearly defined. For better or worse, historically GCC's
attitude to floating point optimizations is to only transformations
that produce bit-identical results by default. A constraint much
stronger than other compilers. Anything less than this has gone under
the name of "unsafe" math optimizations. Any numerical expert will
tell you only have to ignore sign depenendent rounding, or re-associate
an expression to produce a last bit error, which after only a few
additional steps can make the results uncomparable. The numeric
errors you get from ignoring the sign of zeros, are no different in
magnitude than converting divisions into multiplications, or implementing
pow by multiplications, or using extra precision on x87, or use of
fmadd instructions.
On this scale, there are very few vectorization transformations that
GCC would consider "safe" and produce bit identical results. A sad
fact is the vectorizaton is what GCC calls "unsafe". The semantics
and wording you prefer in your patch allows for an unbounded error in
the result, so the distinction can't be a "quality" argument.
So we now need to consider what semantics it is that you are trying
to address. You wish to allow "X/Y" as "(1/Y)*X", but ultimately
how is that any different from plain -ffast-math, which should be
used routinely by GCC users. Perhaps rather than say we'd like a
mysterious new option to include transformation FOO or BAR, perhaps
we need to look at it the other way and ask which transformations
do you want to disallow. If there are none, then the current
defintion should work well for you.
I think there's still some merit in Clint's request for an Mflop
preserving subset of -ffast-math, which is PR28684, but I see any
thing else as solving a different (and perhaps irrelevant) problem.
I agree the wording (and naming) of -funsafe-math-optimizations
needs to be improved. In my mind, -funsafe-math-optimizations is
restricted to all of the mathematically valid transformations that
are permissable assuming unbounded precision arithmetic. Things
like x+0 -> x. Unfortunately we live in a world where the
limitations of our (IEEE) hardware mean that arithmetic performed
in a computer doesn't match or perfectly model a Newtonian universe.
For example, Richard Gunether has recently proposed a patch to
transform pow(x,1.5) into x*sqrt(x), but didn't appreciate why
the transformation was guarded by -funsafe-math-optimizations.
The answer is that although the two expressions are equivalent
mathematically, and the later is not only faster but may often be
more accurate on most inputs, they are not guaranteed to be identical.
Hence codes that assume "y = 1.5; if (pow(x,1.5) == pow(x,y))" may
start to fail. Even though the numerical accuracy has improved,
we disallow this transformation. Indeed both Robert Scott Ladd,
my own OpenEye experience, an other gcc postings have confirmed that
numerical accuracy is usually improved, but at the expense of
numerical precision.
http://en.wikipedia.org/wiki/Accuracy_and_precision
Perhaps we should rename this option -faccurate-math and describe
the default as -fprecise-math. :-)
I'm a bit disappointed that neither the ATLAS folks nor yourself have
yet articulated a strong functionality request. I appreciate that you're
somehow unhappy with -ffast-math, but apart from the Mflops argument
you've failed to put your finger on precisely (or exactly :-) what
about it you believe needs fixing. Even in the Mflops argument is
seems ambiguous whether operations of constant arguments may be evaluated
at compile-time, "2.0 + 3.0 -> 5.0"?
Anyway, I'm pleased that we're discussing the issues.
Roger
--
More information about the Gcc-patches
mailing list