This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: What is acceptable for -ffast-math? (Was: associative law incombine)



On Tue, 31 Jul 2001 dewar@gnat.com wrote:
>
> I still see *no* quantitative data showing that the transformation most
> discussed here (associative redistribution) is effective in improving
> performance.

Actually, the transformation that was discussed here was really

	a/b/c -> a/(b*c)

and I don't know where the associative redistribution thing really came
from. Probably just an example of another thing like it..

> Linus seems to be getting a little emotional in this discussion but swearing
> does not replace data.

Hey, I called people silly, not <censored>. You must have a very low
tolerance ;)

And the replacement-with-reciprocal kind of thing can certainly make a
difference. I'm too lazy to dig out my old quake3 source CD to try to dig
up what I ended up doing there, though.

> This seems such a reasonable position that I really have a heck of a time
> understanding why Linus is so incensed by it.

I'm incensed by people who think that the current -ffast-math already does
something that is positively EVIL - even though the current setup with
inline replacements of "fsin" etc has been there for years.

> Let's take an example, which is multiplication by a reciprocal. This of
> course does give "wrong" results. But division is often quite slow. On the
> other hand, I can't imagine floating-point programmers, including those
> doing game stuff, not knowing this, and thus writing the multiplication
> in the first place.

Actually, in the case of quake3, I distinctly remember how the x86 version
actually used division on purpose, because the x86 code was hand-tuned
assembler and they had counted cycles (for a Pentium machine) where the
division overlapped the other arithmetic.

The x86 hand-tuned stuff also took care to fit inside the FP stack etc -
they had started off with a C version and compiled that, and then once
they got as far as they could they just extracted the asm and did the rest
by hand.

It was also the much more obvious algorithm to do a division.

On alpha (and hey, this was before the 21264), the same did not hold true,
and it was faster to do a reciprocal. But it's been too long.

> <<Oh, round-to-zero is definitely acceptable in the world of "who cares
> about IEEE, we want fast math, and we'll use fixed arithmetic if the FP
> code is too slow".
> >>
>
> That's probably a bad idea, on most modern machines, full IEEE floating-point
> is faster than integer arithmetic (particularly in the case of multiplication)

Ehh.. You seem to think that "hardware == fast".

Denormal handling on x86 is done in microcode as a internal trap in the
pipeline. Round-to-zero is noticeably faster - even when the _code_ is the
same. Which is why Intel has special mode flags - where it internally
rounds to zero instead of making a denormal.

I'm not kidding. Look at the "Denormals are zero flag", bit 6 in the
MXCSR.

And I quote (all typos are probably mine):

 "The denormals-are-zero mode is not compatible with IEEE Standard 754.
  The denormals-are-zeros mode is provided to improve processor
  performance for applications such as streaming media processing, where
  rounding a denormal to zero does not appreciably affect the quality of
  the procesed data"

What I'm saying is that a buge company like Intel chose to do this
optimization IN HARDWARE because they thought that (a) it was meaningful
and (b) they thought the problem warranted it.

		Linus


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]