This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: "Kumar, Venkataramanan" <Venkataramanan dot Kumar at amd dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "nd at arm dot com" <nd at arm dot com>, "marcus dot shawcroft at arm dot com" <marcus dot shawcroft at arm dot com>, "richard dot earnshaw at arm dot com" <richard dot earnshaw at arm dot com>, "philipp dot tomsich at theobroma-systems dot com" <philipp dot tomsich at theobroma-systems dot com>, "pinskia at gmail dot com" <pinskia at gmail dot com>, "Kyrylo dot Tkachov at arm dot com" <Kyrylo dot Tkachov at arm dot com>, "e dot menezes at samsung dot com" <e dot menezes at samsung dot com>
- Date: Tue, 12 Jan 2016 11:48:31 +0000
- Subject: Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
- Authentication-results: sourceware.org; auth=none
- References: <1452513219-25168-1-git-send-email-james dot greenhalgh at arm dot com> <CY1PR1201MB10985A9CD95E8D6BC9C74E408FCA0 at CY1PR1201MB1098 dot namprd12 dot prod dot outlook dot com>
On Tue, Jan 12, 2016 at 05:53:21AM +0000, Kumar, Venkataramanan wrote:
> Hi James,
>
> > -----Original Message-----
> > From: James Greenhalgh [mailto:james.greenhalgh@arm.com]
> > Sent: Monday, January 11, 2016 5:24 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd@arm.com; marcus.shawcroft@arm.com;
> > richard.earnshaw@arm.com; Kumar, Venkataramanan;
> > philipp.tomsich@theobroma-systems.com; pinskia@gmail.com;
> > Kyrylo.Tkachov@arm.com; e.menezes@samsung.com
> > Subject: [Patch AArch64] Use software sqrt expansion always for -mlow-
> > precision-recip-sqrt
> >
> >
> > Hi,
> >
> > I'd like to switch the logic around in aarch64.c such that -mlow-precision-
> > recip-sqrt causes us to always emit the low-precision software expansion for
> > reciprocal square root. I have two reasons to do this; first is consistency
> > across -mcpu targets, second is enabling more -mcpu targets to use the flag
> > for peak tuning.
> >
> > I don't much like that the precision we use for -mlow-precision-recip-sqrt
> > differs between cores (and possibly compiler revisions). Yes, we're under -
> > ffast-math but I take this flag to mean the user explicitly wants the low-
> > precision expansion, and we should not diverge from that based on an
> > internal decision as to what is optimal for performance in the high-precision
> > case. I'd prefer to keep things as predictable as possible, and here that
> > means always emitting the low-precision expansion when asked.
> >
> > Judging by the comments in the thread proposing the reciprocal square root
> > optimisation, this will benefit all cores currently supported by GCC.
> > To be clear, we would still not expand in the high-precision case for any cores
> > which do not explicitly ask for it. Currently that is Cortex-A57 and xgene,
> > though I will be proposing a patch to remove Cortex-A57 from that list
> > shortly.
> >
> > Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> > is intended as a tuning flag for situations where performance is more
> > important than precision, but the current logic requires setting an internal
> > flag which also changes the performance characteristics where high-precision
> > is needed. This conflates two decisions the target might want to make, and
> > reduces the applicability of an option targets might want to enable for
> > performance. In particular, I'd still like to see -mlow-precision-recip-sqrt
> > continue to emit the cheaper, low-precision sequence for floats under
> > Cortex-A57.
> >
> > Based on that reasoning, this patch makes the appropriate change to the
> > logic. I've checked with the current -mcpu values to ensure that behaviour
> > without -mlow-precision-recip-sqrt does not change, and that behaviour
> > with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> >
> > I've also put this through bootstrap and test on aarch64-none-linux-gnu with
> > no issues.
> >
> > OK?
> >
> > Thanks,
> > James
> >
>
> Yes I like enabling this optimization for all cpus target via
> -mlow-precision-recip-sqrt .
>
> If my understanding is correct for cortex-a57 we now need to use only
> -mlow-precision-recip-sqrt to emit software sqrt expansion?
>
> In the below code
> ---snip---
> void
> aarch64_emit_swrsqrt (rtx dst, rtx src)
> {
> ............
> ............
> int iterations = double_mode ? 3 : 2;
>
> if (flag_mrecip_low_precision_sqrt)
> iterations--;
> ---snip---
>
> Now cortex-a57 case we will always do 2 and 1 steps for double and float
> and 3 and 2 will never be used. Should we make it 2 and 1 as default? Or
> any target still needs to use 3 and 2.
The code here should handle two cases:
1) Normal -Ofast case -> Some targets use the estimate expansion with
3 iterations for double, 2 for float. Other targets use the hardware
fsqrt/fdiv instructions.
2) -mlow-precision-recip-sqrt -> All targets use the estimate expansion
with 2 iterations for double, 1 for float.
-mlow-precision-recip-sqrt is a specialisation to be used only when the
programmer knows the lower precision is acceptable. It should not be on
by default...
> Ps: I remember reducing iterations benefited gromacs but caused some VE in
> other FP benchmarks.
... For exactly this reason :-)
Thanks,
James