Bug 32392 - Support using -mrecip w/o additional Newton-Raphson run
Summary: Support using -mrecip w/o additional Newton-Raphson run
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2007-06-18 14:32 UTC by Tobias Burnus
Modified: 2021-09-20 02:59 UTC (History)
4 users (show)

See Also:
Host:
Target: x86_64-*-* i686-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2007-06-19 09:15:09


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Burnus 2007-06-18 14:32:30 UTC
Paolo Bonzini wrote:
>> That said, there is a whole bunch of applications that would kill for -mrecip, 
> even for 11bit ones. Games are one of them, for sure ;)
> What about -mrecip=0/1/2 for the number of NR steps? Or would two steps be 
> slower than divss?
>
> I was thinking of adding this as a follow-up patch ;) Just look how the 
> operations are grouped together.

As Richard pointed out: Having two NR does not make sense. For some cases doing with out Newton-Raphson is enough. (Example: Games -- or SPEC CPU 2006: http://www.hpcwire.com/hpc/1556972.html)

Other compilers have this option, e.g. Pathscale's -OPT:rsqrt=2 [yes, this is used for SPEC runs ;-)]
Comment 1 Tobias Burnus 2007-06-18 15:03:09 UTC
Initial suggestion, see:
http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01068.html

Richard's remark:
http://gcc.gnu.org/ml/gcc-patches/2007-06/msg01224.html
> Two NR steps don't make sense, they wouldn't improve accuracy because of the
> extra roundings we get for the NR.  And of course it would be slower.

(However, two NR are said to be enough for double precision. I don't know whether doing rsqrt+(2x NR) is faster than 1/sqrt() for double or not.)

Related - closed - PRs:
PR 31723 - Use reciprocal and reciprocal square root with -ffast-math (FIXED)
PR 32352 - Using rsqrt, Polyhedron's aermod test crashes (WONTFIX)
Comment 2 Richard Biener 2007-06-19 09:15:09 UTC
Confirmed.  For 2 NR steps to reach double precision (we'd miss it by some more
ulps than the 2.5 for float precision) we would need to do at least the second
NR in double precision.  Note that this would make sense only for double
precision input values that are exactly representable in float precision (otherwise, why the extra precision?).  So practically not worth it.
Comment 3 Eric Gallager 2019-03-03 22:01:13 UTC
(In reply to Tobias Burnus from comment #0)
> Paolo Bonzini wrote:
> >> That said, there is a whole bunch of applications that would kill for -mrecip, 
> > even for 11bit ones. Games are one of them, for sure ;)
> > What about -mrecip=0/1/2 for the number of NR steps? Or would two steps be 
> > slower than divss?
> >
> > I was thinking of adding this as a follow-up patch ;) Just look how the 
> > operations are grouped together.
> 
> As Richard pointed out: Having two NR does not make sense. For some cases
> doing with out Newton-Raphson is enough. (Example: Games -- or SPEC CPU
> 2006: http://www.hpcwire.com/hpc/1556972.html)

Link is dead; archive dot org link: http://web.archive.org/web/20120528224320/http://archive.hpcwire.com/hpc/1556972.html

> 
> Other compilers have this option, e.g. Pathscale's -OPT:rsqrt=2 [yes, this
> is used for SPEC runs ;-)]