[AArch64] Emit square root using the Newton series

Thu Mar 10 19:10:00 GMT 2016

On 03/10/16 10:52, Wilco Dijkstra wrote:
> Hi Evandro,
>
>> I have however encountered precision issues with DF, namely some benchmarks in the SPECfp CPU2000 suite would fail to validate.
> Accuracy is not an issue, the computation is extremely accurate. The issue is that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, and that causes the miscompares you're seeing. So support for the zero case should be added.
>
> This would be a better expansion, supporting zero, and with lower latency than the current sequence:

Now I think of it, frsqrts returns 1.5 for the zero case, so we only need to fix up the estimated
sqrt value before the final multiply. Since a FCSEL/VAND can be hidden completely behind the
latency of frsqrts, both scalar and vector case could do this:

    frsqrte  s1, s0
    fmul     s2, s1, s1
    frsqrts  s2, s0, s2
    fcmp     s0, 0.0
    fmul     s1, s1, s2
    fmul     s2, s1, s1
    fmul     s1, s0, s1
    frsqrts  s2, s0, s2
    fcsel    s1, s0, s1, eq
    fmul     s0, s1, s2

Wilco