This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [AArch64] Emit division using the Newton series

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: Evandro Menezes <e dot menezes at samsung dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
Cc: James Greenhalgh <James dot Greenhalgh at arm dot com>, Andrew Pinski <pinskia at gmail dot com>, nd <nd at arm dot com>
Date: Fri, 1 Apr 2016 22:45:48 +0000
Subject: Re: [AArch64] Emit division using the Newton series
Authentication-results: sourceware.org; auth=none
Nodisclaimer: True
References: <56EB0EDF dot 3060401 at samsung dot com> <56F2C329 dot 10405 at samsung dot com> <56FDA311 dot 7090309 at samsung dot com> <AM3PR08MB0088DDE6EA428B37CE090953839A0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56FED036 dot 2070405 at samsung dot com> <AM3PR08MB00884DBC29E8F0651E1ECEC6839A0 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>,<56FEEE90 dot 3070707 at samsung dot com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:23

Evandro Menezes wrote:

> However, I don't think that there's the need to handle any special case
> for division.  The only case when the approximation differs from
> division is when the numerator is infinity and the denominator, zero,
> when the approximation returns infinity and the division, NAN.  So I
> don't think that it's a special case that deserves being handled.  IOW,
> the result of the approximate reciprocal is always needed.

No, the result of the approximate reciprocal is not needed. 

Basically a NR approximation produces a correction factor that is very close
to 1.0, and then multiplies that with the previous estimate to get a more
accurate estimate. The final calculation for x * recip(y) is:

result = (reciprocal_correction * reciprocal_estimate) * x

while what I am suggesting is a trivial reassociation:

result = reciprocal_correction * (reciprocal_estimate * x)

The computation of the final reciprocal_correction is on the critical latency
path, while reciprocal_estimate is computed earlier, so we can compute
(reciprocal_estimate * x) without increasing the overall latency. Ie. we saved
a multiply.

In principle this could be done as a separate optimization pass that tries to 
reassociate to reduce latency. However I'm not too convinced this would be
easy to implement in GCC's scheduler, so it's best to do it explicitly.

Wilco

Follow-Ups:
- Re: [AArch64] Emit division using the Newton series
  - From: Evandro Menezes

References:
- Re: [AArch64] Emit division using the Newton series
  - From: Wilco Dijkstra
- Re: [AArch64] Emit division using the Newton series
  - From: Evandro Menezes
- Re: [AArch64] Emit division using the Newton series
  - From: Wilco Dijkstra
- Re: [AArch64] Emit division using the Newton series
  - From: Evandro Menezes

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]