[AArch64] Emit square root using the Newton series
Evandro Menezes
e.menezes@samsung.com
Fri Mar 4 00:22:00 GMT 2016
On 02/16/16 14:56, Evandro Menezes wrote:
> On 12/08/15 15:35, Evandro Menezes wrote:
>> Emit square root using the Newton series
>>
>> 2015-12-03 Evandro Menezes <e.menezes@samsung.com>
>>
>> gcc/
>> * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
>> Declare new
>> function.
>> * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
>> expansion and
>> insn definitions.
>> * config/aarch64/aarch64-tuning-flags.def
>> (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
>> * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
>> new function.
>> * config/aarch64/aarch64.md (sqrt<mode>2): New expansion
>> and insn
>> definitions.
>> * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
>> Expand option
>> description.
>> * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
>>
>> This patch extends the patch that added support for implementing
>> x^-1/2 using the Newton series by adding support for x^1/2 as well.
>>
>> Is it OK at this point of stage 3?
>>
>> Thank you,
>>
>
> James,
>
> As I was saying, this patch results in some validation errors in
> CPU2000 benchmarks using DF. Although proving the algorithm to be
> pretty solid with a vast set of random values, I'm confused why some
> benchmarks fail to validate with this implementation of the Newton
> series for square root too, when they pass with the Newton series for
> reciprocal square root.
>
> Since I had no problems with the same algorithm on x86-64, I wonder if
> the initial estimate on AArch64, which offers just 8 bits, whereas
> x86-64 offers 11 bits, has to do with it. Then again, the algorithm
> iterated 1 less time on x86-64 than on AArch64.
>
> Since it seems that the initial estimate is sufficient for CPU2000 to
> validate when using SF, I'm leaning towards restricting the Newton
> series for square root only for SF.
>
> Your thoughts on the matter are appreciated,
Add choices for the reciprocal square root approximation
Allow a target to prefer such operation depending on the FP
precision.
gcc/
* config/aarch64/aarch64-protos.h
(AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
(AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
* config/aarch64/aarch64.c
(use_rsqrt_p): New argument for the mode.
(aarch64_builtin_reciprocal): Devise mode from builtin.
(aarch64_optab_supported_p): New argument for the mode.
Feedback appreciated.
Thank you,
--
Evandro Menezes
More information about the Gcc-patches
mailing list