[AArch64] Emit square root using the Newton series

Evandro Menezes e.menezes@samsung.com
Fri Feb 26 23:42:00 GMT 2016


On 02/26/16 08:59, James Greenhalgh wrote:
> On Mon, Feb 22, 2016 at 06:50:44PM -0600, Evandro Menezes wrote:
>> In preparation for the patch adding the Newton series also for
>> square root, I'd like to propose this patch changing the name of the
>> existing tuning flag for the reciprocal square root.
> This is fine, other names like sw_rsqrt, expand_rsqrt, nr_rsqrt would also
> be OK. Pick your favourite!
>
> One comment on the replacement invoke.texi text below, otherwise this is
> OK to apply now.
>
>> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
>> index 5cbd4cd..155d2bd 100644
>> --- a/gcc/config/aarch64/aarch64.opt
>> +++ b/gcc/config/aarch64/aarch64.opt
>> @@ -151,5 +151,5 @@ PC relative literal loads.
>>   
>>   mlow-precision-recip-sqrt
>>   Common Var(flag_mrecip_low_precision_sqrt) Optimization
>> -When calculating a sqrt approximation, run fewer steps.
>> +Calculate the reciprocal square-root approximation in fewer steps.
>>   This reduces precision, but can result in faster computation.
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 490df93..eeff24d 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -12879,12 +12879,10 @@ corresponding flag to the linker.
>>   @item -mno-low-precision-recip-sqrt
>>   @opindex -mlow-precision-recip-sqrt
>>   @opindex -mno-low-precision-recip-sqrt
>> -The square root estimate uses two steps instead of three for double-precision,
>> -and one step instead of two for single-precision.
>> -Thus reducing latency and precision.
>> -This is only relevant if @option{-ffast-math} activates
>> -reciprocal square root estimate instructions.
>> -Which in turn depends on the target processor.
>> +The reciprocal square root approximation uses one step less than otherwise,
>> +thus reducing latency and precision.
> When calculating the reciprocal square root approximation, use one less
> step than otherwise, thus reducing latency and precision.
>

Checked in as r233772.

Thank you,

-- 
Evandro Menezes



More information about the Gcc-patches mailing list