This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: gcc-patches at gcc dot gnu dot org
- Cc: nd at arm dot com, marcus dot shawcroft at arm dot com, richard dot earnshaw at arm dot com
- Date: Mon, 1 Feb 2016 13:59:34 +0000
- Subject: Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
- Authentication-results: sourceware.org; auth=none
- References: <1452513219-25168-1-git-send-email-james dot greenhalgh at arm dot com> <20160125112124 dot GB8599 at arm dot com>
On Mon, Jan 25, 2016 at 11:21:25AM +0000, James Greenhalgh wrote:
> On Mon, Jan 11, 2016 at 11:53:39AM +0000, James Greenhalgh wrote:
> >
> > Hi,
> >
> > I'd like to switch the logic around in aarch64.c such that
> > -mlow-precision-recip-sqrt causes us to always emit the low-precision
> > software expansion for reciprocal square root. I have two reasons to do
> > this; first is consistency across -mcpu targets, second is enabling more
> > -mcpu targets to use the flag for peak tuning.
> >
> > I don't much like that the precision we use for -mlow-precision-recip-sqrt
> > differs between cores (and possibly compiler revisions). Yes, we're
> > under -ffast-math but I take this flag to mean the user explicitly wants the
> > low-precision expansion, and we should not diverge from that based on an
> > internal decision as to what is optimal for performance in the
> > high-precision case. I'd prefer to keep things as predictable as possible,
> > and here that means always emitting the low-precision expansion when asked.
> >
> > Judging by the comments in the thread proposing the reciprocal square
> > root optimisation, this will benefit all cores currently supported by GCC.
> > To be clear, we would still not expand in the high-precision case for any
> > cores which do not explicitly ask for it. Currently that is Cortex-A57
> > and xgene, though I will be proposing a patch to remove Cortex-A57 from
> > that list shortly.
> >
> > Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> > is intended as a tuning flag for situations where performance is more
> > important than precision, but the current logic requires setting an
> > internal flag which also changes the performance characteristics where
> > high-precision is needed. This conflates two decisions the target might
> > want to make, and reduces the applicability of an option targets might
> > want to enable for performance. In particular, I'd still like to see
> > -mlow-precision-recip-sqrt continue to emit the cheaper, low-precision
> > sequence for floats under Cortex-A57.
> >
> > Based on that reasoning, this patch makes the appropriate change to the
> > logic. I've checked with the current -mcpu values to ensure that behaviour
> > without -mlow-precision-recip-sqrt does not change, and that behaviour
> > with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> >
> > I've also put this through bootstrap and test on aarch64-none-linux-gnu
> > with no issues.
> >
> > OK?
>
> *Ping*
*Pingx2*
Thanks,
James
>
> Thanks,
> James
>
> > 2015-12-10 James Greenhalgh <james.greenhalgh@arm.com>
> >
> > * config/aarch64/aarch64.c (use_rsqrt_p): Always use software
> > reciprocal sqrt for -mlow-precision-recip-sqrt.
> >
>
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 9142ac0..1d5d898 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -7485,8 +7485,9 @@ use_rsqrt_p (void)
> > {
> > return (!flag_trapping_math
> > && flag_unsafe_math_optimizations
> > - && (aarch64_tune_params.extra_tuning_flags
> > - & AARCH64_EXTRA_TUNE_RECIP_SQRT));
> > + && ((aarch64_tune_params.extra_tuning_flags
> > + & AARCH64_EXTRA_TUNE_RECIP_SQRT)
> > + || flag_mrecip_low_precision_sqrt));
> > }
> >
> > /* Function to decide when to use
>