This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [AArch64] Emit square root using the Newton series
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Evandro Menezes <e dot menezes at samsung dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, Andrew Pinski <pinskia at gmail dot com>, Benedikt Huber <benedikt dot huber at theobroma-systems dot com>, philipp dot tomsich at theobroma-systems dot com, Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
- Date: Thu, 17 Mar 2016 14:55:08 +0000
- Subject: Re: [AArch64] Emit square root using the Newton series
- Authentication-results: sourceware.org; auth=none
- References: <56674D34 dot 80806 at samsung dot com> <56C38D00 dot 9000403 at samsung dot com> <56D8D553 dot 6060902 at samsung dot com> <56DF4D50 dot 4060804 at samsung dot com> <56E9B7E1 dot 4010601 at samsung dot com>
On Wed, Mar 16, 2016 at 02:45:37PM -0500, Evandro Menezes wrote:
> On 03/08/16 16:08, Evandro Menezes wrote:
> >On 02/16/16 14:56, Evandro Menezes wrote:
> >>On 12/08/15 15:35, Evandro Menezes wrote:
> >>>Emit square root using the Newton series
> >>>
> >>> 2015-12-03 Evandro Menezes <e.menezes@samsung.com>
> >>>
> >>> gcc/
> >>> * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
> >>> Declare new
> >>> function.
> >>> * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
> >>> expansion and
> >>> insn definitions.
> >>> * config/aarch64/aarch64-tuning-flags.def
> >>> (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
> >>> * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
> >>> new function.
> >>> * config/aarch64/aarch64.md (sqrt<mode>2): New expansion
> >>> and insn
> >>> definitions.
> >>> * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
> >>> Expand option
> >>> description.
> >>> * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
> >>>
> >>>This patch extends the patch that added support for
> >>>implementing x^-1/2 using the Newton series by adding support
> >>>for x^1/2 as well.
> >>>
> >>>Is it OK at this point of stage 3?
> >>>
> >>>Thank you,
> >>>
> >>
> >>James,
> >>
> >>As I was saying, this patch results in some validation errors in
> >>CPU2000 benchmarks using DF. Although proving the algorithm to
> >>be pretty solid with a vast set of random values, I'm confused
> >>why some benchmarks fail to validate with this implementation of
> >>the Newton series for square root too, when they pass with the
> >>Newton series for reciprocal square root.
> >>
> >>Since I had no problems with the same algorithm on x86-64, I
> >>wonder if the initial estimate on AArch64, which offers just 8
> >>bits, whereas x86-64 offers 11 bits, has to do with it. Then
> >>again, the algorithm iterated 1 less time on x86-64 than on
> >>AArch64.
> >>
> >>Since it seems that the initial estimate is sufficient for
> >>CPU2000 to validate when using SF, I'm leaning towards
> >>restricting the Newton series for square root only for SF.
> >>
> >>Your thoughts on the matter are appreciated,
> >
> > Add choices for the reciprocal square root approximation
> >
> > Allow a target to prefer such operation depending on the FP
> > precision.
> >
> > gcc/
> > * config/aarch64/aarch64-protos.h
> > (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
> > * config/aarch64/aarch64-tuning-flags.def
> > (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
> > (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
> > * config/aarch64/aarch64.c
> > (use_rsqrt_p): New argument for the mode.
> > (aarch64_builtin_reciprocal): Devise mode from builtin.
> > (aarch64_optab_supported_p): New argument for the mode.
> >
> >
> >Now that the patch is attached, feedback is appreciated.
>
> Ping.
Hi Evandro,
I thought this was on hold while you looked in to the underlying issue for
the failures in the other thread? With that said, I'm struggling to keep
up with where we are on this, so maybe it is time for a clean break - a new
thread for patch set v2, proposed as an explicit patch series (just to keep
the dependencies clear to me).
I'm not convinced of the value of this split, nor why we would stop here
if it was useful (vector modes vs. scalar modes would also seem an
important distinction).
If you no longer need the workaround this enables then I'm not sure I see a
good reason for it to go in, maybe I'm missing a target for which this
would be important?
Thanks,
James