Paolo Bonzini wrote: >> That said, there is a whole bunch of applications that would kill for -mrecip, > even for 11bit ones. Games are one of them, for sure ;) > What about -mrecip=0/1/2 for the number of NR steps? Or would two steps be > slower than divss? > > I was thinking of adding this as a follow-up patch ;) Just look how the > operations are grouped together. As Richard pointed out: Having two NR does not make sense. For some cases doing with out Newton-Raphson is enough. (Example: Games -- or SPEC CPU 2006: http://www.hpcwire.com/hpc/1556972.html) Other compilers have this option, e.g. Pathscale's -OPT:rsqrt=2 [yes, this is used for SPEC runs ;-)]
Initial suggestion, see: http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01068.html Richard's remark: http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01224.html > Two NR steps don't make sense, they wouldn't improve accuracy because of the > extra roundings we get for the NR. And of course it would be slower. (However, two NR are said to be enough for double precision. I don't know whether doing rsqrt+(2x NR) is faster than 1/sqrt() for double or not.) Related - closed - PRs: PR 31723 - Use reciprocal and reciprocal square root with -ffast-math (FIXED) PR 32352 - Using rsqrt, Polyhedron's aermod test crashes (WONTFIX)
Confirmed. For 2 NR steps to reach double precision (we'd miss it by some more ulps than the 2.5 for float precision) we would need to do at least the second NR in double precision. Note that this would make sense only for double precision input values that are exactly representable in float precision (otherwise, why the extra precision?). So practically not worth it.
(In reply to Tobias Burnus from comment #0) > Paolo Bonzini wrote: > >> That said, there is a whole bunch of applications that would kill for -mrecip, > > even for 11bit ones. Games are one of them, for sure ;) > > What about -mrecip=0/1/2 for the number of NR steps? Or would two steps be > > slower than divss? > > > > I was thinking of adding this as a follow-up patch ;) Just look how the > > operations are grouped together. > > As Richard pointed out: Having two NR does not make sense. For some cases > doing with out Newton-Raphson is enough. (Example: Games -- or SPEC CPU > 2006: http://www.hpcwire.com/hpc/1556972.html) Link is dead; archive dot org link: http://web.archive.org/web/20120528224320/http://archive.hpcwire.com/hpc/1556972.html > > Other compilers have this option, e.g. Pathscale's -OPT:rsqrt=2 [yes, this > is used for SPEC runs ;-)]