[PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

Sun Jun 28 15:13:00 GMT 2015

> On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com> wrote:
> 
> I got around ~12% gain with -Ofast -mcpu=cortex-a57.

I get around 11/12% on thunderX with the patch and the decreasing the iterations change (1/2) compared to without the patch. 

Thanks,
Andrew

> 
> Regards,
> Venkat.
> 
>> -----Original Message-----
>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-    
>> owner@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich
>> Sent: Thursday, June 25, 2015 9:13 PM
>> To: Kumar, Venkataramanan
>> Cc: Benedikt Huber; pinskia@gmail.com; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
>> estimation in -ffast-math
>> 
>> Kumar,
>> 
>> what is the relative gain that you see on Cortex-A57?
>> 
>> Thanks,
>> Philipp.
>> 
>>>> On 25 Jun 2015, at 17:35, Kumar, Venkataramanan
>>> <Venkataramanan.Kumar@amd.com> wrote:
>>> 
>>> Changing to  "1 step for float" and "2 steps for double" gives better gains
>> now for gromacs on cortex-a57.
>>> 
>>> Regards,
>>> Venkat.
>>>> -----Original Message-----
>>>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
>>>> owner@gcc.gnu.org] On Behalf Of Benedikt Huber
>>>> Sent: Thursday, June 25, 2015 4:09 PM
>>>> To: pinskia@gmail.com
>>>> Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma-systems.com
>>>> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
>>>> (rsqrt) estimation in -ffast-math
>>>> 
>>>> Andrew,
>>>> 
>>>>> This is NOT a win on thunderX at least for single precision because
>>>>> you have
>>>> to do the divide and sqrt in the same time as it takes 5 multiples
>>>> (estimate and step are multiplies in the thunderX pipeline).  Doubles
>>>> is 10 multiplies which is just the same as what the patch does (but
>>>> it is really slightly less than 10, I rounded up). So in the end this
>>>> is NOT a win at all for thunderX unless we do one less step for both single
>> and double.
>>>> 
>>>> Yes, the expected benefit from rsqrt estimation is implementation
>>>> specific. If one has a better initial rsqrte or an application that
>>>> can trade precision for execution time, we could offer a command line
>>>> option to do only 2 steps for doulbe and 1 step for float; similar to -
>> mrecip-precision for PowerPC.
>>>> What are your thoughts on that?
>>>> 
>>>> Best regards,
>>>> Benedikt
>