[AArch64] Emit division using the Newton series
Evandro Menezes
e.menezes@samsung.com
Fri Apr 1 21:56:00 GMT 2016
On 04/01/16 16:22, Wilco Dijkstra wrote:
> Evandro Menezes wrote:
>>> The division variant should use the same latency reduction trick I mentioned for sqrt.
>> I don't think that it applies here, since it doesn't have to deal with
>> special cases.
> No it applies as it's exactly the same calculation: x * rsqrt(y) and x * recip(y). In both
> cases you don't need the final result of rsqrt(y) or recip(y), avoiding a multiply.
> Given these sequences are high latency this saving is actually quite important.
Wilco,
In the case of sqrt(), the special case when the argument is 0.0
multiplication is necessary in order to guarantee correctness. Handling
this special case hurts performance, when your suggestion helps.
However, I don't think that there's the need to handle any special case
for division. The only case when the approximation differs from
division is when the numerator is infinity and the denominator, zero,
when the approximation returns infinity and the division, NAN. So I
don't think that it's a special case that deserves being handled. IOW,
the result of the approximate reciprocal is always needed.
Or am I missing something?
Thank you,
--
Evandro Menezes
More information about the Gcc-patches
mailing list