This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RE: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

From: "Kumar, Venkataramanan" <Venkataramanan dot Kumar at amd dot com>
To: James Greenhalgh <james dot greenhalgh at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Cc: "nd at arm dot com" <nd at arm dot com>, "marcus dot shawcroft at arm dot com" <marcus dot shawcroft at arm dot com>, "richard dot earnshaw at arm dot com" <richard dot earnshaw at arm dot com>, "philipp dot tomsich at theobroma-systems dot com" <philipp dot tomsich at theobroma-systems dot com>, "pinskia at gmail dot com" <pinskia at gmail dot com>, "Kyrylo dot Tkachov at arm dot com" <Kyrylo dot Tkachov at arm dot com>, "e dot menezes at samsung dot com" <e dot menezes at samsung dot com>
Date: Tue, 12 Jan 2016 05:53:21 +0000
Subject: RE: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp dot mailfrom=Venkataramanan dot Kumar at amd dot com;
References: <1452513219-25168-1-git-send-email-james dot greenhalgh at arm dot com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:23

Hi James,

> -----Original Message-----
> From: James Greenhalgh [mailto:james.greenhalgh@arm.com]
> Sent: Monday, January 11, 2016 5:24 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd@arm.com; marcus.shawcroft@arm.com;
> richard.earnshaw@arm.com; Kumar, Venkataramanan;
> philipp.tomsich@theobroma-systems.com; pinskia@gmail.com;
> Kyrylo.Tkachov@arm.com; e.menezes@samsung.com
> Subject: [Patch AArch64] Use software sqrt expansion always for -mlow-
> precision-recip-sqrt
> 
> 
> Hi,
> 
> I'd like to switch the logic around in aarch64.c such that -mlow-precision-
> recip-sqrt causes us to always emit the low-precision software expansion for
> reciprocal square root. I have two reasons to do this; first is consistency
> across -mcpu targets, second is enabling more -mcpu targets to use the flag
> for peak tuning.
> 
> I don't much like that the precision we use for -mlow-precision-recip-sqrt
> differs between cores (and possibly compiler revisions). Yes, we're under -
> ffast-math but I take this flag to mean the user explicitly wants the low-
> precision expansion, and we should not diverge from that based on an
> internal decision as to what is optimal for performance in the high-precision
> case. I'd prefer to keep things as predictable as possible, and here that
> means always emitting the low-precision expansion when asked.
> 
> Judging by the comments in the thread proposing the reciprocal square root
> optimisation, this will benefit all cores currently supported by GCC.
> To be clear, we would still not expand in the high-precision case for any cores
> which do not explicitly ask for it. Currently that is Cortex-A57 and xgene,
> though I will be proposing a patch to remove Cortex-A57 from that list
> shortly.
> 
> Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> is intended as a tuning flag for situations where performance is more
> important than precision, but the current logic requires setting an internal
> flag which also changes the performance characteristics where high-precision
> is needed. This conflates two decisions the target might want to make, and
> reduces the applicability of an option targets might want to enable for
> performance. In particular, I'd still like to see -mlow-precision-recip-sqrt
> continue to emit the cheaper, low-precision sequence for floats under
> Cortex-A57.
> 
> Based on that reasoning, this patch makes the appropriate change to the
> logic. I've checked with the current -mcpu values to ensure that behaviour
> without -mlow-precision-recip-sqrt does not change, and that behaviour
> with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> 
> I've also put this through bootstrap and test on aarch64-none-linux-gnu with
> no issues.
> 
> OK?
> 
> Thanks,
> James
> 

Yes I like enabling this optimization for all cpus target via -mlow-precision-recip-sqrt .
 
If my understanding is correct for cortex-a57 we now need to use only -mlow-precision-recip-sqrt to emit software sqrt expansion?

In the below code 
---snip---
void
aarch64_emit_swrsqrt (rtx dst, rtx src)
{
............
............
  int iterations = double_mode ? 3 : 2;

  if (flag_mrecip_low_precision_sqrt)
    iterations--;
 ---snip---

Now cortex-a57 case we will always do  2 and 1 steps  for double and float  and  3 and 2 will never be used.     
Should we make it 2 and 1 as default? Or any target still needs to use 3 and 2. 

Ps: I remember reducing iterations benefited gromacs but caused some VE in other FP benchmarks.  

Regards,
Venkat.



> ---
> 2015-12-10  James Greenhalgh  <james.greenhalgh@arm.com>
> 
> 	* config/aarch64/aarch64.c (use_rsqrt_p): Always use software
> 	reciprocal sqrt for -mlow-precision-recip-sqrt.

Follow-Ups:
- Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
  - From: James Greenhalgh

References:
- [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
  - From: James Greenhalgh

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]