This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] Emit division using the Newton series
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Evandro Menezes <e dot menezes at samsung dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: James Greenhalgh <James dot Greenhalgh at arm dot com>, Andrew Pinski <pinskia at gmail dot com>, nd <nd at arm dot com>
- Date: Fri, 1 Apr 2016 22:45:48 +0000
- Subject: Re: [AArch64] Emit division using the Newton series
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <56EB0EDF dot 3060401 at samsung dot com> <56F2C329 dot 10405 at samsung dot com> <56FDA311 dot 7090309 at samsung dot com> <AM3PR08MB0088DDE6EA428B37CE090953839A0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56FED036 dot 2070405 at samsung dot com> <AM3PR08MB00884DBC29E8F0651E1ECEC6839A0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>,<56FEEE90 dot 3070707 at samsung dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
Evandro Menezes wrote:
> However, I don't think that there's the need to handle any special case
> for division. The only case when the approximation differs from
> division is when the numerator is infinity and the denominator, zero,
> when the approximation returns infinity and the division, NAN. So I
> don't think that it's a special case that deserves being handled. IOW,
> the result of the approximate reciprocal is always needed.
No, the result of the approximate reciprocal is not needed.
Basically a NR approximation produces a correction factor that is very close
to 1.0, and then multiplies that with the previous estimate to get a more
accurate estimate. The final calculation for x * recip(y) is:
result = (reciprocal_correction * reciprocal_estimate) * x
while what I am suggesting is a trivial reassociation:
result = reciprocal_correction * (reciprocal_estimate * x)
The computation of the final reciprocal_correction is on the critical latency
path, while reciprocal_estimate is computed earlier, so we can compute
(reciprocal_estimate * x) without increasing the overall latency. Ie. we saved
a multiply.
In principle this could be done as a separate optimization pass that tries to
reassociate to reduce latency. However I'm not too convinced this would be
easy to implement in GCC's scheduler, so it's best to do it explicitly.
Wilco