This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math


Benedikt,

On 25/06/15 08:01, pinskia@gmail.com wrote:




On Jun 18, 2015, at 5:04 AM, Benedikt Huber <benedikt.huber@theobroma-systems.com> wrote:

arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation and
a Newton-Raphson step, respectively.
There are ARMv8 implementations where this is faster than using fdiv and rsqrt.
It runs three steps for double and two steps for float to achieve the needed precision.

This is NOT a win on thunderX at least for single precision because you have to do the divide and sqrt in the same time as it takes 5 multiples (estimate and step are multiplies in the thunderX pipeline).  Doubles is 10 multiplies which is just the same as what the patch does (but it is really slightly less than 10, I rounded up). So in the end this is NOT a win at all for thunderX unless we do one less step for both single and double.



Have you seen this https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00164.html ? Really this is something that should be gated by the costs infrastructure .


regards
Ramana





Thanks,
Andrew



There is one caveat and open question.
Since -ffast-math enables flush to zero intermediate values between approximation steps
will be flushed to zero if they are denormal.
E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
The test cases pass, but it is unclear to me whether this is expected behavior with -ffast-math.

The patch applies to commit:
svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470

Please consider including this patch.
Thank you and best regards,
Benedikt Huber

Benedikt Huber (1):
  2015-06-15  Benedikt Huber  <benedikt.huber@theobroma-systems.com>

gcc/ChangeLog                            |   9 +++
gcc/config/aarch64/aarch64-builtins.c    |  60 ++++++++++++++++
gcc/config/aarch64/aarch64-protos.h      |   2 +
gcc/config/aarch64/aarch64-simd.md       |  27 ++++++++
gcc/config/aarch64/aarch64.c             |  63 +++++++++++++++++
gcc/config/aarch64/aarch64.md            |   3 +
gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113 +++++++++++++++++++++++++++++++
7 files changed, 277 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c

--
1.9.1



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]