Bug 56843

Summary: PowerPC Newton-Raphson reciprocal estimates can be improved
Product: gcc Reporter: Bill Schmidt <bill.schmidt>
Component: targetAssignee: Bill Schmidt <bill.schmidt>
Status: RESOLVED FIXED    
Severity: normal CC: bergner, dje
Priority: P3 Keywords: missed-optimization
Version: 4.9.0   
Target Milestone: 4.9.0   
Host: powerpc*-*-* Target: powerpc*-*-*
Build: powerpc*-*-* Known to work:
Known to fail: Last reconfirmed: 2013-04-04 00:00:00

Description Bill Schmidt 2013-04-04 15:03:11 UTC
It was recently brought to my attention that the number of Newton-Raphson iterations for floating reciprocal-estimate and floating recriprocal-sqrt-estimate can be tightened.  In particular, for 32-bit floating-point values targeting processors having higher precision estimates, a single iteration should suffice to produce maximum representable precision.  We currently perform two.  We should verify that one is actually sufficient in practice.

We should also investigate whether 3 iterations is sufficient for 64-bit floating-point values when targeting processors having lower precision estimates.  The theoretical math suggests 4 may be necessary, but this could be too conservative in practice as this is derived from a general bound on the method.
Comment 1 David Edelsohn 2013-04-04 15:09:17 UTC
Confirmed.
Comment 2 Bill Schmidt 2013-04-04 16:12:31 UTC
Regarding the last point, I found this in the user manual:

"The double-precision square root estimate instructions are not generated by default on low-precision machines, since they do not provide an estimate that converges after three steps."

That seems to indicate someone decided the libcall is better than a four-step iteration.  That doesn't necessarily seem obvious to me.
Comment 3 Bill Schmidt 2013-04-05 15:03:26 UTC
Looks like we can improve performance for three cases on P6 and later machines:
 - 32-bit reciprocal square root: remove two instructions
 - 32-bit reciprocal: remove three instructions
 - 64-bit reciprocal: remove one instruction

The last is due to a subtle bug in the existing implementation.
Comment 4 Bill Schmidt 2013-04-05 19:29:44 UTC
Fixed in r197534.
Comment 5 Bill Schmidt 2013-10-21 21:40:16 UTC
Author: wschmidt
Date: Mon Oct 21 21:40:14 2013
New Revision: 203910

URL: http://gcc.gnu.org/viewcvs?rev=203910&root=gcc&view=rev
Log:
gcc:

2013-10-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	Backport from mainline
	2013-04-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/56843
	* config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove.
	(rs6000_emit_swdiv_low_precision): Remove.
	(rs6000_emit_swdiv): Rewrite to handle between one and four
	iterations of Newton-Raphson generally; modify required number of
	iterations for some cases.
	* config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove.

gcc/testsuite:

2013-10-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	Backport from mainline
	2013-04-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/56843
	* gcc.target/powerpc/recip-1.c: Modify expected output.
	* gcc.target/powerpc/recip-3.c: Likewise.
	* gcc.target/powerpc/recip-4.c: Likewise.
	* gcc.target/powerpc/recip-5.c: Add expected output for iterations.


Modified:
    branches/ibm/gcc-4_8-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_8-branch/gcc/config/rs6000/rs6000.c
    branches/ibm/gcc-4_8-branch/gcc/config/rs6000/rs6000.h
    branches/ibm/gcc-4_8-branch/gcc/testsuite/ChangeLog.ibm
    branches/ibm/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-1.c
    branches/ibm/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-3.c
    branches/ibm/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-4.c
    branches/ibm/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-5.c
Comment 6 Bill Schmidt 2014-04-04 14:29:56 UTC
Author: wschmidt
Date: Fri Apr  4 14:29:23 2014
New Revision: 209104

URL: http://gcc.gnu.org/viewcvs?rev=209104&root=gcc&view=rev
Log:
[gcc]

2014-04-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	Backport from mainline
	2013-04-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/56843
	* config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove.
	(rs6000_emit_swdiv_low_precision): Remove.
	(rs6000_emit_swdiv): Rewrite to handle between one and four
	iterations of Newton-Raphson generally; modify required number of
	iterations for some cases.
	* config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove.

[gcc/testsuite]

2014-04-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	Backport from mainline
	2013-04-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/56843
	* gcc.target/powerpc/recip-1.c: Modify expected output.
	* gcc.target/powerpc/recip-3.c: Likewise.
	* gcc.target/powerpc/recip-4.c: Likewise.
	* gcc.target/powerpc/recip-5.c: Add expected output for iterations.


Modified:
    branches/gcc-4_8-branch/gcc/ChangeLog
    branches/gcc-4_8-branch/gcc/config/rs6000/rs6000.c
    branches/gcc-4_8-branch/gcc/config/rs6000/rs6000.h
    branches/gcc-4_8-branch/gcc/testsuite/ChangeLog
    branches/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-1.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-3.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-4.c
    branches/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-5.c