This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] Emit square root using the Newton series
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Evandro Menezes <e dot menezes at samsung dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
- Date: Thu, 10 Mar 2016 19:10:24 +0000
- Subject: Re: [AArch64] Emit square root using the Newton series
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <AM3PR08MB00886499882773F3C8B9F71D83B30 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <011d01d17a26$31b3ade0$951b09a0$ at samsung dot com> <AM3PR08MB0088D558E387C1B736785AA883B40 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>,<56E1A7AD dot 90408 at samsung dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
On 03/10/16 10:52, Wilco Dijkstra wrote:
> Hi Evandro,
>
>> I have however encountered precision issues with DF, namely some benchmarks in the SPECfp CPU2000 suite would fail to validate.
> Accuracy is not an issue, the computation is extremely accurate. The issue is that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, and that causes the miscompares you're seeing. So support for the zero case should be added.
>
> This would be a better expansion, supporting zero, and with lower latency than the current sequence:
Now I think of it, frsqrts returns 1.5 for the zero case, so we only need to fix up the estimated
sqrt value before the final multiply. Since a FCSEL/VAND can be hidden completely behind the
latency of frsqrts, both scalar and vector case could do this:
frsqrte s1, s0
fmul s2, s1, s1
frsqrts s2, s0, s2
fcmp s0, 0.0
fmul s1, s1, s2
fmul s2, s1, s1
fmul s1, s0, s1
frsqrts s2, s0, s2
fcsel s1, s0, s1, eq
fmul s0, s1, s2
Wilco