This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [AArch64] Emit square root using the Newton series

From: Evandro Menezes <e dot menezes at samsung dot com>
To: GCC Patches <gcc-patches at gcc dot gnu dot org>, Marcus Shawcroft <Marcus dot Shawcroft at arm dot com>, James Greenhalgh <james dot greenhalgh at arm dot com>, Andrew Pinski <pinskia at gmail dot com>, Benedikt Huber <benedikt dot huber at theobroma-systems dot com>, philipp dot tomsich at theobroma-systems dot com, Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
Date: Tue, 08 Mar 2016 16:20:23 -0600
Subject: Re: [AArch64] Emit square root using the Newton series
Authentication-results: sourceware.org; auth=none
References: <56674D34 dot 80806 at samsung dot com> <56C38D00 dot 9000403 at samsung dot com> <56D8D553 dot 6060902 at samsung dot com> <56DF4D50 dot 4060804 at samsung dot com> <56DF4FB2 dot 3060207 at samsung dot com>

On 03/08/16 16:08, Evandro Menezes wrote:

On 02/16/16 14:56, Evandro Menezes wrote:
On 12/08/15 15:35, Evandro Menezes wrote:
Emit square root using the Newton series

   2015-12-03  Evandro Menezes  <e.menezes@samsung.com>

   gcc/
            * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
   Declare new
            function.
            * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
   expansion and
            insn definitions.
            * config/aarch64/aarch64-tuning-flags.def
            (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
            * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
   new function.
            * config/aarch64/aarch64.md (sqrt<mode>2): New expansion
   and insn
            definitions.
            * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Expand option
            description.
            * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
This patch extends the patch that added support for implementingx^-1/2 using the Newton series by adding support for x^1/2 as well.
Is it OK at this point of stage 3?

Thank you,
James,
As I was saying, this patch results in some validation errors inCPU2000 benchmarks using DF. Although proving the algorithm to bepretty solid with a vast set of random values, I'm confused why somebenchmarks fail to validate with this implementation of the Newtonseries for square root too, when they pass with the Newton series forreciprocal square root.
Since I had no problems with the same algorithm on x86-64, I wonderif the initial estimate on AArch64, which offers just 8 bits, whereasx86-64 offers 11 bits, has to do with it. Then again, the algorithmiterated 1 less time on x86-64 than on AArch64.
Since it seems that the initial estimate is sufficient for CPU2000 tovalidate when using SF, I'm leaning towards restricting the Newtonseries for square root only for SF.
Your thoughts on the matter are appreciated,
        Add choices for the reciprocal square root approximation

        Allow a target to prefer such operation depending on the FP
   precision.

        gcc/
            * config/aarch64/aarch64-protos.h
            (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro.
            * config/aarch64/aarch64-tuning-flags.def
            (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask.
            (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise.
            * config/aarch64/aarch64.c
            (use_rsqrt_p): New argument for the mode.
            (aarch64_builtin_reciprocal): Devise mode from builtin.
            (aarch64_optab_supported_p): New argument for the mode.


        Emit square root using the Newton series

        gcc/
            * config/aarch64/aarch64-tuning-flags.def
            (AARCH64_EXTRA_TUNE_APPROX_SQRT_{DF,SF}): New tuning macros.
            * config/aarch64/aarch64-protos.h
            (aarch64_emit_approx_sqrt): Declare new function.
            * config/aarch64/aarch64.c
            (aarch64_emit_approx_sqrt): Define new function.
            * config/aarch64/aarch64.md
            (sqrt*2): New expansion and insn definitions.
            * config/aarch64/aarch64-simd.md (sqrt*2): Likewise.
            * config/aarch64/aarch64.opt
            (mlow-precision-recip-sqrt): Expand option description.
            * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.

This patch, which depends onhttps://gcc.gnu.org/ml/gcc-patches/2016-03/msg00534.html, leverages thereciprocal square root approximation to emit a faster square rootapproximation.

I have however encountered precision issues with DF, namely somebenchmarks in the SPECfp CPU2000 suite would fail to validate. Perhapsthe initial estimate, with just 8 bits, is not good enough for theseries to converge given the workloads of such benchmarks; perhapsdenormals, known to occur in some of these benchmarks, result inerrors. This was the motivation to split the tuning flags between onespecific for DF and the other, for SF in the previous related patch.


Again, now with the patch attached, your feedback is appreciated.

Thank you,

--
Evandro Menezes

>From 4f61f722f744339650a48aa034906dd685110ae2 Mon Sep 17 00:00:00 2001
From: Evandro Menezes <e.menezes@samsung.com>
Date: Tue, 8 Mar 2016 15:06:03 -0600
Subject: [PATCH] Emit square root using the Newton series

gcc/
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_SQRT_{DF,SF}): New tuning macros.
	* config/aarch64/aarch64-protos.h
	(aarch64_emit_approx_sqrt): Declare new function.
	* config/aarch64/aarch64.c
	(aarch64_emit_approx_sqrt): Define new function.
	* config/aarch64/aarch64.md
	(sqrt*2): New expansion and insn definitions.
	* config/aarch64/aarch64-simd.md (sqrt*2): Likewise.
	* config/aarch64/aarch64.opt
	(mlow-precision-recip-sqrt): Expand option description.
	* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h         |  3 +++
 gcc/config/aarch64/aarch64-simd.md          | 25 ++++++++++++++++++++-
 gcc/config/aarch64/aarch64-tuning-flags.def |  3 ++-
 gcc/config/aarch64/aarch64.c                | 35 ++++++++++++++++++++++++-----
 gcc/config/aarch64/aarch64.md               | 25 ++++++++++++++++++++-
 gcc/config/aarch64/aarch64.opt              |  4 ++--
 gcc/doc/invoke.texi                         |  9 ++++----
 7 files changed, 89 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index ee3505c..3f7e76b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -265,6 +265,8 @@ enum aarch64_extra_tuning_flags
 
 #define AARCH64_EXTRA_TUNE_APPROX_RSQRT \
   (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF | AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF)
+#define AARCH64_EXTRA_TUNE_APPROX_SQRT \
+  (AARCH64_EXTRA_TUNE_APPROX_SQRT_DF | AARCH64_EXTRA_TUNE_APPROX_SQRT_SF)
 
 extern struct tune_params aarch64_tune_params;
 
@@ -364,6 +366,7 @@ void aarch64_register_pragmas (void);
 void aarch64_relayout_simd_types (void);
 void aarch64_reset_previous_fndecl (void);
 void aarch64_emit_approx_rsqrt (rtx, rtx);
+void aarch64_emit_approx_sqrt (rtx, rtx);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce..afeca5a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4307,7 +4307,30 @@
 
 ;; sqrt
 
-(define_insn "sqrt<mode>2"
+(define_expand "sqrt<mode>2"
+  [(set (match_operand:VDQF 0 "register_operand")
+	(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
+  "TARGET_SIMD"
+{
+  machine_mode mode = GET_MODE_INNER (GET_MODE (operands[1]));
+
+  if (flag_finite_math_only
+      && !flag_trapping_math
+      && flag_unsafe_math_optimizations
+      && !optimize_function_for_size_p (cfun)
+      && ((mode == SFmode
+           && (aarch64_tune_params.extra_tuning_flags
+               & AARCH64_EXTRA_TUNE_APPROX_SQRT_SF))
+          || (mode == DFmode
+              && (aarch64_tune_params.extra_tuning_flags
+                  & AARCH64_EXTRA_TUNE_APPROX_SQRT_DF))))
+    {
+      aarch64_emit_approx_sqrt (operands[0], operands[1]);
+      DONE;
+    }
+})
+
+(define_insn "*sqrt<mode>2"
   [(set (match_operand:VDQF 0 "register_operand" "=w")
         (sqrt:VDQF (match_operand:VDQF 1 "register_operand" "w")))]
   "TARGET_SIMD"
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 57d9588..b4421b1 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -31,4 +31,5 @@
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
 AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT_DF)
 AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrtf", APPROX_RSQRT_SF)
-
+AARCH64_EXTRA_TUNING_OPTION ("approx_sqrt", APPROX_SQRT_DF)
+AARCH64_EXTRA_TUNING_OPTION ("approx_sqrtf", APPROX_SQRT_SF)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 39a1a47..5e5dc5f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -538,7 +538,8 @@ static const struct tune_params exynosm1_tunings =
   48,	/* max_case_values.  */
   64,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_APPROX_RSQRT) /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_APPROX_SQRT_SF
+   | AARCH64_EXTRA_TUNE_APPROX_RSQRT) /* tune_flags.  */
 };
 
 static const struct tune_params thunderx_tunings =
@@ -7537,9 +7538,8 @@ void
 aarch64_emit_approx_rsqrt (rtx dst, rtx src)
 {
   machine_mode mode = GET_MODE (src);
-  gcc_assert (
-    mode == SFmode || mode == V2SFmode || mode == V4SFmode
-	|| mode == DFmode || mode == V2DFmode);
+  gcc_assert (GET_MODE_INNER (mode) == SFmode
+              || GET_MODE_INNER (mode) == DFmode);
 
   rtx xsrc = gen_reg_rtx (mode);
   emit_move_insn (xsrc, src);
@@ -7547,8 +7547,7 @@ aarch64_emit_approx_rsqrt (rtx dst, rtx src)
 
   emit_insn ((*get_rsqrte_type (mode)) (x0, xsrc));
 
-  bool double_mode = (mode == DFmode || mode == V2DFmode);
-
+  bool double_mode = (GET_MODE_INNER (mode) == DFmode);
   int iterations = double_mode ? 3 : 2;
 
   /* Optionally iterate over the series one less time than otherwise.  */
@@ -7571,6 +7570,30 @@ aarch64_emit_approx_rsqrt (rtx dst, rtx src)
   emit_move_insn (dst, x0);
 }
 
+/* Emit instruction sequence to compute the approximate square root.  */
+
+void
+aarch64_emit_approx_sqrt (rtx dst, rtx src)
+{
+  machine_mode mode = GET_MODE (src);
+  gcc_assert (GET_MODE_INNER (mode) == SFmode
+              || GET_MODE_INNER (mode) == DFmode);
+
+  rtx xsrc = gen_reg_rtx (mode);
+  emit_move_insn (xsrc, src);
+
+  /* Calculate the approximate square root by multiplying the approximate
+     reciprocal square root...  */
+  rtx xrsqrt = gen_reg_rtx (mode);
+  aarch64_emit_approx_rsqrt (xrsqrt, xsrc);
+
+  /* ... by the original value.  */
+  rtx xsqrt = gen_reg_rtx (mode);
+  emit_set_insn (xsqrt, gen_rtx_MULT (mode, xrsqrt, xsrc));
+
+  emit_move_insn (dst, xsqrt);
+}
+
 /* Return the number of instructions that can be issued per cycle.  */
 static int
 aarch64_sched_issue_rate (void)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 68676c9..bd9947a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4665,7 +4665,30 @@
   [(set_attr "type" "ffarith<s>")]
 )
 
-(define_insn "sqrt<mode>2"
+(define_expand "sqrt<mode>2"
+  [(set (match_operand:GPF 0 "register_operand")
+        (sqrt:GPF (match_operand:GPF 1 "register_operand")))]
+  "TARGET_SIMD"
+{
+  machine_mode mode = GET_MODE_INNER (GET_MODE (operands[1]));
+
+  if (flag_finite_math_only
+      && !flag_trapping_math
+      && flag_unsafe_math_optimizations
+      && !optimize_function_for_size_p (cfun)
+      && ((mode == SFmode
+           && (aarch64_tune_params.extra_tuning_flags
+               & AARCH64_EXTRA_TUNE_APPROX_SQRT_SF))
+          || (mode == DFmode
+              && (aarch64_tune_params.extra_tuning_flags
+                  & AARCH64_EXTRA_TUNE_APPROX_SQRT_DF))))
+    {
+      aarch64_emit_approx_sqrt (operands[0], operands[1]);
+      DONE;
+    }
+})
+
+(define_insn "*sqrt<mode>2"
   [(set (match_operand:GPF 0 "register_operand" "=w")
         (sqrt:GPF (match_operand:GPF 1 "register_operand" "w")))]
   "TARGET_FLOAT"
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 49ef0c6..8bb12d6 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -151,5 +151,5 @@ PC relative literal loads.
 
 mlow-precision-recip-sqrt
 Common Var(flag_mrecip_low_precision_sqrt) Optimization
-When calculating the reciprocal square root approximation,
-uses one less step than otherwise, thus reducing latency and precision.
+When calculating the approximate square root or its approximate reciprocal,
+use one less step than otherwise, thus reducing latency and precision.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 62c70d5..24ad1f3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12887,10 +12887,11 @@ corresponding flag to the linker.
 @item -mno-low-precision-recip-sqrt
 @opindex -mlow-precision-recip-sqrt
 @opindex -mno-low-precision-recip-sqrt
-When calculating the reciprocal square root approximation,
-uses one less step than otherwise, thus reducing latency and precision.
-This is only relevant if @option{-ffast-math} enables the reciprocal square root
-approximation, which in turn depends on the target processor.
+When calculating the approximate square root or its approximate reciprocal,
+use one less step than otherwise, thus reducing latency and precision.
+This is only relevant if @option{-ffast-math} enables
+the approximate square root or its approximate reciprocal,
+which in turn depends on the target processor.
 
 @item -march=@var{name}
 @opindex march
-- 
2.6.3

References:
- Re: [AArch64] Emit square root using the Newton series
  - From: Evandro Menezes
- Re: [AArch64] Emit square root using the Newton series
  - From: Evandro Menezes
- Re: [AArch64] Emit square root using the Newton series
  - From: Evandro Menezes

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]