[AArch64] Emit square root using the Newton series

Fri Mar 4 00:22:00 GMT 2016

On 02/16/16 14:56, Evandro Menezes wrote:
> On 12/08/15 15:35, Evandro Menezes wrote:
>> Emit square root using the Newton series
>>
>>    2015-12-03  Evandro Menezes  <e.menezes@samsung.com>
>>
>>    gcc/
>>             * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
>>    Declare new
>>             function.
>>             * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
>>    expansion and
>>             insn definitions.
>>             * config/aarch64/aarch64-tuning-flags.def
>>             (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
>>             * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
>>    new function.
>>             * config/aarch64/aarch64.md (sqrt<mode>2): New expansion
>>    and insn
>>             definitions.
>>             * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
>>    Expand option
>>             description.
>>             * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
>>
>> This patch extends the patch that added support for implementing 
>> x^-1/2 using the Newton series by adding support for x^1/2 as well.
>>
>> Is it OK at this point of stage 3?
>>
>> Thank you,
>>
>
> James,
>
> As I was saying, this patch results in some validation errors in 
> CPU2000 benchmarks using DF.  Although proving the algorithm to be 
> pretty solid with a vast set of random values, I'm confused why some 
> benchmarks fail to validate with this implementation of the Newton 
> series for square root too, when they pass with the Newton series for 
> reciprocal square root.
>
> Since I had no problems with the same algorithm on x86-64, I wonder if 
> the initial estimate on AArch64, which offers just 8 bits, whereas 
> x86-64 offers 11 bits, has to do with it.  Then again, the algorithm 
> iterated 1 less time on x86-64 than on AArch64.
>
> Since it seems that the initial estimate is sufficient for CPU2000 to 
> validate when using SF, I'm leaning towards restricting the Newton 
> series for square root only for SF.
>
> Your thoughts on the matter are appreciated,

         Add choices for the reciprocal square root approximation

         Allow a target to prefer such operation depending on the FP
    precision.

         gcc/
             * config/aarch64/aarch64-protos.h
             (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
             * config/aarch64/aarch64-tuning-flags.def
             (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
             (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
             * config/aarch64/aarch64.c
             (use_rsqrt_p): New argument for the mode.
             (aarch64_builtin_reciprocal): Devise mode from builtin.
             (aarch64_optab_supported_p): New argument for the mode.

Feedback appreciated.

Thank you,

-- 
Evandro Menezes