This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
- From: "Kumar, Venkataramanan" <Venkataramanan dot Kumar at amd dot com>
- To: James Greenhalgh <james dot greenhalgh at arm dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Cc: "nd at arm dot com" <nd at arm dot com>, "marcus dot shawcroft at arm dot com" <marcus dot shawcroft at arm dot com>, "richard dot earnshaw at arm dot com" <richard dot earnshaw at arm dot com>, "philipp dot tomsich at theobroma-systems dot com" <philipp dot tomsich at theobroma-systems dot com>, "pinskia at gmail dot com" <pinskia at gmail dot com>, "Kyrylo dot Tkachov at arm dot com" <Kyrylo dot Tkachov at arm dot com>, "e dot menezes at samsung dot com" <e dot menezes at samsung dot com>
- Date: Tue, 12 Jan 2016 05:53:21 +0000
- Subject: RE: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp dot mailfrom=Venkataramanan dot Kumar at amd dot com;
- References: <1452513219-25168-1-git-send-email-james dot greenhalgh at arm dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
Hi James,
> -----Original Message-----
> From: James Greenhalgh [mailto:james.greenhalgh@arm.com]
> Sent: Monday, January 11, 2016 5:24 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd@arm.com; marcus.shawcroft@arm.com;
> richard.earnshaw@arm.com; Kumar, Venkataramanan;
> philipp.tomsich@theobroma-systems.com; pinskia@gmail.com;
> Kyrylo.Tkachov@arm.com; e.menezes@samsung.com
> Subject: [Patch AArch64] Use software sqrt expansion always for -mlow-
> precision-recip-sqrt
>
>
> Hi,
>
> I'd like to switch the logic around in aarch64.c such that -mlow-precision-
> recip-sqrt causes us to always emit the low-precision software expansion for
> reciprocal square root. I have two reasons to do this; first is consistency
> across -mcpu targets, second is enabling more -mcpu targets to use the flag
> for peak tuning.
>
> I don't much like that the precision we use for -mlow-precision-recip-sqrt
> differs between cores (and possibly compiler revisions). Yes, we're under -
> ffast-math but I take this flag to mean the user explicitly wants the low-
> precision expansion, and we should not diverge from that based on an
> internal decision as to what is optimal for performance in the high-precision
> case. I'd prefer to keep things as predictable as possible, and here that
> means always emitting the low-precision expansion when asked.
>
> Judging by the comments in the thread proposing the reciprocal square root
> optimisation, this will benefit all cores currently supported by GCC.
> To be clear, we would still not expand in the high-precision case for any cores
> which do not explicitly ask for it. Currently that is Cortex-A57 and xgene,
> though I will be proposing a patch to remove Cortex-A57 from that list
> shortly.
>
> Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> is intended as a tuning flag for situations where performance is more
> important than precision, but the current logic requires setting an internal
> flag which also changes the performance characteristics where high-precision
> is needed. This conflates two decisions the target might want to make, and
> reduces the applicability of an option targets might want to enable for
> performance. In particular, I'd still like to see -mlow-precision-recip-sqrt
> continue to emit the cheaper, low-precision sequence for floats under
> Cortex-A57.
>
> Based on that reasoning, this patch makes the appropriate change to the
> logic. I've checked with the current -mcpu values to ensure that behaviour
> without -mlow-precision-recip-sqrt does not change, and that behaviour
> with -mlow-precision-recip-sqrt is to emit the low precision sequences.
>
> I've also put this through bootstrap and test on aarch64-none-linux-gnu with
> no issues.
>
> OK?
>
> Thanks,
> James
>
Yes I like enabling this optimization for all cpus target via -mlow-precision-recip-sqrt .
If my understanding is correct for cortex-a57 we now need to use only -mlow-precision-recip-sqrt to emit software sqrt expansion?
In the below code
---snip---
void
aarch64_emit_swrsqrt (rtx dst, rtx src)
{
............
............
int iterations = double_mode ? 3 : 2;
if (flag_mrecip_low_precision_sqrt)
iterations--;
---snip---
Now cortex-a57 case we will always do 2 and 1 steps for double and float and 3 and 2 will never be used.
Should we make it 2 and 1 as default? Or any target still needs to use 3 and 2.
Ps: I remember reducing iterations benefited gromacs but caused some VE in other FP benchmarks.
Regards,
Venkat.
> ---
> 2015-12-10 James Greenhalgh <james.greenhalgh@arm.com>
>
> * config/aarch64/aarch64.c (use_rsqrt_p): Always use software
> reciprocal sqrt for -mlow-precision-recip-sqrt.