[AArch64] Emit square root using the Newton series
Wilco Dijkstra
Wilco.Dijkstra@arm.com
Thu Mar 10 19:10:00 GMT 2016
On 03/10/16 10:52, Wilco Dijkstra wrote:
> Hi Evandro,
>
>> I have however encountered precision issues with DF, namely some benchmarks in the SPECfp CPU2000 suite would fail to validate.
> Accuracy is not an issue, the computation is extremely accurate. The issue is that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, and that causes the miscompares you're seeing. So support for the zero case should be added.
>
> This would be a better expansion, supporting zero, and with lower latency than the current sequence:
Now I think of it, frsqrts returns 1.5 for the zero case, so we only need to fix up the estimated
sqrt value before the final multiply. Since a FCSEL/VAND can be hidden completely behind the
latency of frsqrts, both scalar and vector case could do this:
frsqrte s1, s0
fmul s2, s1, s1
frsqrts s2, s0, s2
fcmp s0, 0.0
fmul s1, s1, s2
fmul s2, s1, s1
fmul s1, s0, s1
frsqrts s2, s0, s2
fcsel s1, s0, s1, eq
fmul s0, s1, s2
Wilco
More information about the Gcc-patches
mailing list