This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
- From: "Kumar, Venkataramanan" <Venkataramanan dot Kumar at amd dot com>
- To: "Dr. Philipp Tomsich" <philipp dot tomsich at theobroma-systems dot com>, "Evandro Menezes" <e dot menezes at samsung dot com>
- Cc: Benedikt Huber <benedikt dot huber at theobroma-systems dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 25 Jun 2015 05:14:24 +0000
- Subject: RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is 165.204.84.221) smtp.mailfrom=amd.com; gcc.gnu.org; dkim=none (message not signed) header.d=none;
- References: <1434629045-24650-1-git-send-email-benedikt dot huber at theobroma-systems dot com> <027701d0ae9c$b8f3eff0$2adbcfd0$ at samsung dot com> <56A9A836-05BF-409C-A8D4-91B7ABEC5EE9 at theobroma-systems dot com>
Hi,
If I understand correct, current implementation replaces
fdiv
fsqrt
by
frsqrte
for i=0 to 3
fmul
frsqrts
fmul
So I think gains depends latency of frsqrts insn.
I see patch has patterns for vector versions of frsqrts, but does not enable them?
Regards,
Venkat.
> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich
> Sent: Wednesday, June 24, 2015 10:22 PM
> To: Evandro Menezes
> Cc: Benedikt Huber; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
> estimation in -ffast-math
>
> Evandro,
>
> Weâve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal
> sqrt.
>
> Also, the âreciprocal divideâ patches are floating around in various of our git-
> tree, but arenât ready for public consumption, yetâ Iâll leave Benedikt to
> comment on potential timelines for getting that pushed out.
>
> Best,
> Philipp.
>
> > On 24 Jun 2015, at 18:42, Evandro Menezes <e.menezes@samsung.com>
> wrote:
> >
> > Benedikt,
> >
> > You beat me to it! :-) Do you have the implementation for dividing
> > using the Newton series as well?
> >
> > I'm not sure that the series is always for all data types and on all
> > processors. It would be useful to allow each AArch64 processor to
> > enable this or not depending on the data type. BTW, do you have some
> > tests showing the speed up?
> >
> > Thank you,
> >
> > --
> > Evandro Menezes Austin, TX
> >
> >> -----Original Message-----
> >> From: gcc-patches-owner@gcc.gnu.org
> >> [mailto:gcc-patches-owner@gcc.gnu.org]
> > On
> >> Behalf Of Benedikt Huber
> >> Sent: Thursday, June 18, 2015 7:04
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: benedikt.huber@theobroma-systems.com;
> philipp.tomsich@theobroma-
> >> systems.com
> >> Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
> >> estimation in -ffast-math
> >>
> >> arch64 offers the instructions frsqrte and frsqrts, for rsqrt
> >> estimation
> > and
> >> a Newton-Raphson step, respectively.
> >> There are ARMv8 implementations where this is faster than using fdiv
> >> and rsqrt.
> >> It runs three steps for double and two steps for float to achieve the
> > needed
> >> precision.
> >>
> >> There is one caveat and open question.
> >> Since -ffast-math enables flush to zero intermediate values between
> >> approximation steps will be flushed to zero if they are denormal.
> >> E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
> >> The test cases pass, but it is unclear to me whether this is expected
> >> behavior with -ffast-math.
> >>
> >> The patch applies to commit:
> >> svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
> >>
> >> Please consider including this patch.
> >> Thank you and best regards,
> >> Benedikt Huber
> >>
> >> Benedikt Huber (1):
> >> 2015-06-15 Benedikt Huber <benedikt.huber@theobroma-
> systems.com>
> >>
> >> gcc/ChangeLog | 9 +++
> >> gcc/config/aarch64/aarch64-builtins.c | 60 ++++++++++++++++
> >> gcc/config/aarch64/aarch64-protos.h | 2 +
> >> gcc/config/aarch64/aarch64-simd.md | 27 ++++++++
> >> gcc/config/aarch64/aarch64.c | 63 +++++++++++++++++
> >> gcc/config/aarch64/aarch64.md | 3 +
> >> gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
> >> +++++++++++++++++++++++++++++++
> >> 7 files changed, 277 insertions(+)
> >> create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
> >>
> >> --
> >> 1.9.1
> > <Mail Attachment.eml>