[AArch64] Remove AARCH64_EXTRA_TUNE_RECIP_SQRT from Cortex-A57 tuning

Evandro Menezes e.menezes@samsung.com
Mon Feb 15 17:25:00 GMT 2016


On 02/15/16 04:50, James Greenhalgh wrote:
> On Mon, Feb 08, 2016 at 10:57:10AM +0000, James Greenhalgh wrote:
>> On Mon, Feb 01, 2016 at 02:00:01PM +0000, James Greenhalgh wrote:
>>> On Mon, Jan 25, 2016 at 11:20:46AM +0000, James Greenhalgh wrote:
>>>> On Mon, Jan 11, 2016 at 12:04:43PM +0000, James Greenhalgh wrote:
>>>>> Hi,
>>>>>
>>>>> I've seen a couple of large performance issues caused by expanding
>>>>> the high-precision reciprocal square root for Cortex-A57, so I'd like
>>>>> to turn it off by default.
>>>>>
>>>>> This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from
>>>>> Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for
>>>>> some private microbenchmark kernels which stress the divide/sqrt/multiply
>>>>> units. It therefore seems to me to be the correct choice to make across
>>>>> a number of workloads.
>>>>>
>>>>> Bootstrapped and tested on aarch64-none-linux-gnu with no issues.
>>>>>
>>>>> OK?
>>>> *Ping*
>>> *pingx2*
>> *ping^3*
> *ping^4*
>
> Thanks,
> James
>
>>>>> ---
>>>>> 2015-12-11  James Greenhalgh  <james.greenhalgh@arm.com>
>>>>>
>>>>> 	* config/aarch64/aarch64.c (cortexa57_tunings): Remove
>>>>> 	AARCH64_EXTRA_TUNE_RECIP_SQRT.
>>>>>
>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>>> index 1d5d898..999c9fc 100644
>>>>> --- a/gcc/config/aarch64/aarch64.c
>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>> @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings =
>>>>>     0,	/* max_case_values.  */
>>>>>     0,	/* cache_line_size.  */
>>>>>     tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
>>>>> -  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS
>>>>> -   | AARCH64_EXTRA_TUNE_RECIP_SQRT)	/* tune_flags.  */
>>>>> +  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS)	/* tune_flags.  */
>>>>>   };
>>>>>   
>>>>>   static const struct tune_params cortexa72_tunings =
>

James,

There seem to be SPEC CPU2000fp validation issues on A57 when this flag 
is present too.  Though I evaluated the algorithm with a huge random set 
of values, always delivering accuracy around 1ulp, which should be 
enough for CPU2000fp (wit x86-64), I expected the benchmarks to pass.

My suspicion is that the Newton series on AArch64 is probably good only 
for SP.  Then, DP might require an extra round, probably exacerbating 
the performance penalty.

I'd like to try to split this tuning option into one for SP and another 
for DP.  Thoughts?

Thank you,

-- 
Evandro Menezes



More information about the Gcc-patches mailing list