This is the mail archive of the
`gcc-patches@gcc.gnu.org`
mailing list for the GCC project.

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |

Other format: | [Raw text] |

*From*: Clint Whaley <whaley at cs dot utsa dot edu>*To*: roger at eyesopen dot com, ERES at il dot ibm dot com*Cc*: ZAKS at il dot ibm dot com, whaley at cs dot utsa dot edu, rguenther at suse dot de, gcc-patches at gcc dot gnu dot org, DORIT at il dot ibm dot com*Date*: Tue, 14 Nov 2006 15:28:05 -0600*Subject*: Re: [RFC] Fix PR28684*References*: <Pine.LNX.4.44.0611140748230.29879-100000@www.eyesopen.com>

Guys, Sorry for the delay in responding the ongoing discussion, I'm pretty snowed under at the moment. However, let me take a minute to respond to the questions that seem to have been directed my way. Most importantly, my request had to do with IEEE compliance, not FLOP count. Vectorization can often increase the FLOP count (for example by performancing redundant computation or by the need to add the vectors up at the end of a loop due to scalar expansion in the vectorization of accumulators). My problem is that -funsafe-math says it may violate IEEE compliance, for instance by using 3DNow!, which instead of handling overflow and underflow (and gradual overflow), just uses saturating instructions. Therefore, at the end of the computation, instead of, for instance, finding an infinity in the answer (telling you there's an error), you find a normal value instead. So, you think your spacecraft survived reentry, rather than broke apart, and everything is golden until the entire crew dies because the simulation went screwy with no way to find that out (or the X-ray being analyzed comes back with "no tumor", etc) . . . At the same time, modern x86 architectures are stressing the SSE unit over the x87 unit. Vectorized code has a 4 (8 single precision) computational advantage over x87 code, and I think a 2 (4) advantage over scalar SSE on the new Core2Duo. AMD's planned arch will have have a 2 (4) advantage over the x87 or scalar SSE as well. Therefore, numerical programmers absolutely need to be able to utilize vectorization, or they give away as much as a factor of 8 in performance. I believe IBM is similarly doing more and more vectorization with their various PowerPC systems that have a variety of vector units affixed to them. So, vectorization is absolutely required on modern systems. However, we also need to be sure that the arithmetic is IEEE compliant. In grouping vectorization with non-IEEE-compliant arithmetic, you do indeed make it so that numerical people cannot safely use the operation, as discussed in my original report and outlined here. IEEE is the only real definition of what a FLOP is, and therefore the only way to bound the error, and find things like exceptional conditions. Since vectorization is particularly needed today, and an absolute must tomorrow, one idea is therefore to have a flag that allows vectorization only. There are a lot of optimizations that are presently in unsafe that do not break IEEE compliance, but do things like re-order. I think someone then suggested that instead of just allowing vectorizations, why not allow all the reordering operations? I'm certainly OK with this. The killer for me (and pretty much every numerical computation that simulates something real, as opposed to game physics, etc, where disasters don't cost $ and lives), is using non-IEEE arithmetic. As for how much info the user needs, the more I get, the more I can use a particular flag. If a flag tells me it may reorder flops and reciprocate divisions, I can examine my code and certify it accurate in the face of that, and throw the flag for code that can take it (99% of my library). Any flag which has the possibility of doing non-IEEE arithmetic is a non-starter for serious numerical work. I don't believe icc violates the IEEE standard w/o a flag (reorder: definitely). In the past, some machines did because there was no support for particular parts of the IEEE standard, but people knew what were the missing parts that they could count on not having due to hardware (my memory may be bad, but I think early DECs and maybe SPARCs didn't handle gradual underflow right, but just rounded to zero, for instance). So, a special flag for vectorization fills my needs, as does a more general flag that allows all reorderings (including those necessary to vectorization). I think most numerical users will be OK with this grouping. Grouping non-IEEE in is what makes the flag unusable. Thanks, Clint >From roger@eyesopen.com Tue Nov 14 10:42:41 2006 Return-Path: <roger@eyesopen.com> X-Original-To: whaley@cs.utsa.edu Delivered-To: whaley@cs.utsa.edu Date: Tue, 14 Nov 2006 08:55:55 -0700 (MST) From: Roger Sayle <roger@eyesopen.com> To: Revital1 Eres <ERES@il.ibm.com> cc: Ayal Zaks <ZAKS@il.ibm.com>, Dorit Nuzman <DORIT@il.ibm.com>, <gcc-patches@gcc.gnu.org>, Richard Guenther <rguenther@suse.de>, "R. Clint Whaley" <whaley@cs.utsa.edu> Subject: Re: [RFC] Fix PR28684 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on mail0.cs.utsa.edu X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.2 Status: RO On Tue, 14 Nov 2006, Revital1 Eres wrote: > > Perhaps pragmatically, reciprocal-math-optimizations can be those > > that strength reduce divisions into (an equal number of) > > multiplications. > > but x / y into x * (1/y) seems to not fit it which might hurt > targets which support reciprocal but not division operations (i.e. > Altivec). > > I still prefer the former definition... but that just me :-) I'm trying to constructively work towards some kind of resolution. To me the fundamental issue is why not use "-ffast-math"? Some of the arguments have focused on "because I don't understand which transformations that enables". The ignorance of users is not normally a great motivation for change. If someone can't come up with a concrete example of a problematic transformation, then there's probably no reason not to trust GCC's optimizers. Clint Whaley's bug report, PR middle-end/28684, survived not being immediately closed a "not-a-bug/wont-fix" because if provided an interesting use case. When benchmarking hardware on numerical kernels, it's useful to establish an Mflop number, which is the number of floating point operations. This allows several forms of associativity, but not the full range of -ffast-math. This is a well-defined problem, useful to a small group of specialists, and it seems reasonable to support it. Your interest from the point of view of vectorization is unrelated, and not clearly defined. For better or worse, historically GCC's attitude to floating point optimizations is to only transformations that produce bit-identical results by default. A constraint much stronger than other compilers. Anything less than this has gone under the name of "unsafe" math optimizations. Any numerical expert will tell you only have to ignore sign depenendent rounding, or re-associate an expression to produce a last bit error, which after only a few additional steps can make the results uncomparable. The numeric errors you get from ignoring the sign of zeros, are no different in magnitude than converting divisions into multiplications, or implementing pow by multiplications, or using extra precision on x87, or use of fmadd instructions. On this scale, there are very few vectorization transformations that GCC would consider "safe" and produce bit identical results. A sad fact is the vectorizaton is what GCC calls "unsafe". The semantics and wording you prefer in your patch allows for an unbounded error in the result, so the distinction can't be a "quality" argument. So we now need to consider what semantics it is that you are trying to address. You wish to allow "X/Y" as "(1/Y)*X", but ultimately how is that any different from plain -ffast-math, which should be used routinely by GCC users. Perhaps rather than say we'd like a mysterious new option to include transformation FOO or BAR, perhaps we need to look at it the other way and ask which transformations do you want to disallow. If there are none, then the current defintion should work well for you. I think there's still some merit in Clint's request for an Mflop preserving subset of -ffast-math, which is PR28684, but I see any thing else as solving a different (and perhaps irrelevant) problem. I agree the wording (and naming) of -funsafe-math-optimizations needs to be improved. In my mind, -funsafe-math-optimizations is restricted to all of the mathematically valid transformations that are permissable assuming unbounded precision arithmetic. Things like x+0 -> x. Unfortunately we live in a world where the limitations of our (IEEE) hardware mean that arithmetic performed in a computer doesn't match or perfectly model a Newtonian universe. For example, Richard Gunether has recently proposed a patch to transform pow(x,1.5) into x*sqrt(x), but didn't appreciate why the transformation was guarded by -funsafe-math-optimizations. The answer is that although the two expressions are equivalent mathematically, and the later is not only faster but may often be more accurate on most inputs, they are not guaranteed to be identical. Hence codes that assume "y = 1.5; if (pow(x,1.5) == pow(x,y))" may start to fail. Even though the numerical accuracy has improved, we disallow this transformation. Indeed both Robert Scott Ladd, my own OpenEye experience, an other gcc postings have confirmed that numerical accuracy is usually improved, but at the expense of numerical precision. http://en.wikipedia.org/wiki/Accuracy_and_precision Perhaps we should rename this option -faccurate-math and describe the default as -fprecise-math. :-) I'm a bit disappointed that neither the ATLAS folks nor yourself have yet articulated a strong functionality request. I appreciate that you're somehow unhappy with -ffast-math, but apart from the Mflops argument you've failed to put your finger on precisely (or exactly :-) what about it you believe needs fixing. Even in the Mflops argument is seems ambiguous whether operations of constant arguments may be evaluated at compile-time, "2.0 + 3.0 -> 5.0"? Anyway, I'm pleased that we're discussing the issues. Roger --

**Follow-Ups**:**Re: [RFC] Fix PR28684***From:*Richard Guenther

**References**:**Re: [RFC] Fix PR28684***From:*Roger Sayle

Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|

Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |