This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] Emit division using the Newton series
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: Evandro Menezes <e dot menezes at samsung dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: James Greenhalgh <James dot Greenhalgh at arm dot com>, Andrew Pinski <pinskia at gmail dot com>, nd <nd at arm dot com>
- Date: Fri, 1 Apr 2016 13:58:01 +0000
- Subject: Re: [AArch64] Emit division using the Newton series
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <56EB0EDF dot 3060401 at samsung dot com> <56F2C329 dot 10405 at samsung dot com>,<56FDA311 dot 7090309 at samsung dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
Evandro Menezes wrote:
On 03/23/16 11:24, Evandro Menezes wrote:
> On 03/17/16 15:09, Evandro Menezes wrote:
>> This patch implements FP division by an approximation using the Newton
>> series.
>>
>> With this patch, DF division is sped up by over 100% and SF division,
>> zilch, both on A57 and on M1.
Mentioning throughput is not useful given that the vectorized single precision
case will give most of the speedup in actual code.
> gcc/
> * config/aarch64/aarch64-tuning-flags.def
> (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
> * config/aarch64/aarch64-protos.h
> (AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
> (aarch64_emit_approx_div): Declare new function.
> * config/aarch64/aarch64.c
> (aarch64_emit_approx_div): Define new function.
> * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
> * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
>
>
> This version of the patch cleans up the changes to the MD files and
> optimizes the division when the numerator is 1.0.
Adding support for plain recip is good. Having the enabling logic no longer in
the md file is an improvement, but I don't believe adding tuning flags for the inner
mode is correct - we need a more generic solution like I mentioned in my other mail.
The division variant should use the same latency reduction trick I mentioned for sqrt.
Wilco