Summary: | PowerPC Newton-Raphson reciprocal estimates can be improved | ||
---|---|---|---|
Product: | gcc | Reporter: | Bill Schmidt <bill.schmidt> |
Component: | target | Assignee: | Bill Schmidt <bill.schmidt> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | bergner, dje |
Priority: | P3 | Keywords: | missed-optimization |
Version: | 4.9.0 | ||
Target Milestone: | 4.9.0 | ||
Host: | powerpc*-*-* | Target: | powerpc*-*-* |
Build: | powerpc*-*-* | Known to work: | |
Known to fail: | Last reconfirmed: | 2013-04-04 00:00:00 |
Description
Bill Schmidt
2013-04-04 15:03:11 UTC
Confirmed. Regarding the last point, I found this in the user manual: "The double-precision square root estimate instructions are not generated by default on low-precision machines, since they do not provide an estimate that converges after three steps." That seems to indicate someone decided the libcall is better than a four-step iteration. That doesn't necessarily seem obvious to me. Looks like we can improve performance for three cases on P6 and later machines: - 32-bit reciprocal square root: remove two instructions - 32-bit reciprocal: remove three instructions - 64-bit reciprocal: remove one instruction The last is due to a subtle bug in the existing implementation. Author: wschmidt Date: Mon Oct 21 21:40:14 2013 New Revision: 203910 URL: http://gcc.gnu.org/viewcvs?rev=203910&root=gcc&view=rev Log: gcc: 2013-10-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2013-04-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/56843 * config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove. (rs6000_emit_swdiv_low_precision): Remove. (rs6000_emit_swdiv): Rewrite to handle between one and four iterations of Newton-Raphson generally; modify required number of iterations for some cases. * config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove. gcc/testsuite: 2013-10-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2013-04-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/56843 * gcc.target/powerpc/recip-1.c: Modify expected output. * gcc.target/powerpc/recip-3.c: Likewise. * gcc.target/powerpc/recip-4.c: Likewise. * gcc.target/powerpc/recip-5.c: Add expected output for iterations. Modified: branches/ibm/gcc-4_8-branch/gcc/ChangeLog.ibm branches/ibm/gcc-4_8-branch/gcc/config/rs6000/rs6000.c branches/ibm/gcc-4_8-branch/gcc/config/rs6000/rs6000.h branches/ibm/gcc-4_8-branch/gcc/testsuite/ChangeLog.ibm branches/ibm/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-1.c branches/ibm/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-3.c branches/ibm/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-4.c branches/ibm/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-5.c Author: wschmidt Date: Fri Apr 4 14:29:23 2014 New Revision: 209104 URL: http://gcc.gnu.org/viewcvs?rev=209104&root=gcc&view=rev Log: [gcc] 2014-04-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2013-04-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/56843 * config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove. (rs6000_emit_swdiv_low_precision): Remove. (rs6000_emit_swdiv): Rewrite to handle between one and four iterations of Newton-Raphson generally; modify required number of iterations for some cases. * config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove. [gcc/testsuite] 2014-04-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com> Backport from mainline 2013-04-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> PR target/56843 * gcc.target/powerpc/recip-1.c: Modify expected output. * gcc.target/powerpc/recip-3.c: Likewise. * gcc.target/powerpc/recip-4.c: Likewise. * gcc.target/powerpc/recip-5.c: Add expected output for iterations. Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/config/rs6000/rs6000.c branches/gcc-4_8-branch/gcc/config/rs6000/rs6000.h branches/gcc-4_8-branch/gcc/testsuite/ChangeLog branches/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-1.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-3.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-4.c branches/gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/recip-5.c |