This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [AArch64] Emit square root using the Newton series


On 12/10/15 04:30, Kyrill Tkachov wrote:

On 09/12/15 18:50, Evandro Menezes wrote:
On 12/09/2015 11:16 AM, Kyrill Tkachov wrote:

On 09/12/15 17:02, Kyrill Tkachov wrote:

On 09/12/15 16:59, Evandro Menezes wrote:
On 12/09/2015 10:52 AM, Kyrill Tkachov wrote:
Hi Evandro,

On 08/12/15 21:35, Evandro Menezes wrote:
Emit square root using the Newton series

   2015-12-03  Evandro Menezes <e.menezes@samsung.com>

   gcc/
* config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt):
   Declare new
            function.
            * config/aarch64/aarch64-simd.md (sqrt<mode>2): New
   expansion and
            insn definitions.
            * config/aarch64/aarch64-tuning-flags.def
            (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro.
* config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define
   new function.
* config/aarch64/aarch64.md (sqrt<mode>2): New expansion
   and insn
            definitions.
* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Expand option
            description.
* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise.

This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well.

Is it OK at this point of stage 3?


A comment on the patch itself from me...

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 6f7dbce..11c6c9a 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -30,4 +30,4 @@

 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
 AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
-
+AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT)

That seems like a misleading name to me.
If we're doing this, that means that the sqrt instruction is not faster
than doing the inverse sqrt estimation followed by a multiply.
I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines
is more appropriate.

Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. :-( So, other targets when this is also true, using the "fast_sqrt" option might make sense.


Sure, but the way your patch is written, we will emit the series when "fast_sqrt" is set, rather
than the other way around, unless I'm misreading the logic in:


Sorry, what I meant to say is it would be clearer, IMO, to describe the compiler action that is being taken (e.g. the rename_fma_regs tuning flag), in this case it's estimating sqrt using a series.

Kyrill

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 030a101..f6d2da4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4280,7 +4280,23 @@

 ;; sqrt

-(define_insn "sqrt<mode>2"
+(define_expand "sqrt<mode>2"
+  [(set (match_operand:VDQF 0 "register_operand")
+    (sqrt:VDQF (match_operand:VDQF 1 "register_operand")))]
+  "TARGET_SIMD"
+{
+ if ((AARCH64_EXTRA_TUNE_FAST_SQRT & aarch64_tune_params.extra_tuning_flags)
+      && !optimize_function_for_size_p (cfun)
+      && flag_finite_math_only
+      && !flag_trapping_math
+      && flag_unsafe_math_optimizations)
+    {
+      aarch64_emit_swsqrt (operands[0], operands[1]);
+      DONE;
+    }
+})


Kyrill,

How about "approx_sqrt" for, you guessed it, approximate square root? The same adjective would perhaps describe "recip_sqrt" better too.


Sounds good to me.
Sorry for the bikeshedding.

        Rename the reciprocal square root tuning flag

        Rename the tuning option to enable the Newton series for the
   reciprocal square
        root to reflect its approximative characteristic.

        2016-02-22  Evandro Menezes  <e.menezes@samsung.com>

        gcc/
            * config/aarch64/aarch64-tuning-flags.def: Rename tuning
   flag to
            AARCH64_EXTRA_TUNE_APPROX_RSQRT.
            * config/aarch64/aarch64.c (xgene1_tunings): Use new name.
            (use_rsqrt_p): Likewise.
            * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt):
   Reword the
            text explaining this option.
            * doc/invoke.texi (-mlow-precision-recip-sqrt): Likewise.


In preparation for the patch adding the Newton series also for square root, I'd like to propose this patch changing the name of the existing tuning flag for the reciprocal square root.

Thank you,

--
Evandro Menezes

>From 7043444f83c12de0ab50627a8b386e3070050591 Mon Sep 17 00:00:00 2001
From: Evandro Menezes <e.menezes@samsung.com>
Date: Mon, 22 Feb 2016 17:49:09 -0600
Subject: [PATCH] Rename the reciprocal square root tuning flag

Rename the tuning option to enable the Newton series for the reciprocal square
root to reflect its approximative characteristic.

2016-02-22  Evandro Menezes  <e.menezes@samsung.com>

gcc/
	* config/aarch64/aarch64-tuning-flags.def: Rename tuning flag to
	AARCH64_EXTRA_TUNE_APPROX_RSQRT.
	* config/aarch64/aarch64.c (xgene1_tunings): Use new name.
	(use_rsqrt_p): Likewise.
	* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Reword the
	text explaining this option.
	* doc/invoke.texi (-mlow-precision-recip-sqrt): Likewise.
---
 gcc/config/aarch64/aarch64-tuning-flags.def |  2 +-
 gcc/config/aarch64/aarch64.c                |  4 ++--
 gcc/config/aarch64/aarch64.opt              |  2 +-
 gcc/doc/invoke.texi                         | 10 ++++------
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 8036cfe..7e45a0c 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,5 +29,5 @@
      AARCH64_TUNE_ to give an enum name. */
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
-AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT)
+AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 923a4b3..ebf47da 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -586,7 +586,7 @@ static const struct tune_params xgene1_tunings =
   0,	/* max_case_values.  */
   0,	/* cache_line_size.  */
   tune_params::AUTOPREFETCHER_OFF,	/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_RECIP_SQRT)	/* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_APPROX_RSQRT)	/* tune_flags.  */
 };
 
 /* Support for fine-grained override of the tuning structures.  */
@@ -7469,7 +7469,7 @@ use_rsqrt_p (void)
   return (!flag_trapping_math
 	  && flag_unsafe_math_optimizations
 	  && ((aarch64_tune_params.extra_tuning_flags
-	       & AARCH64_EXTRA_TUNE_RECIP_SQRT)
+	       & AARCH64_EXTRA_TUNE_APPROX_RSQRT)
 	      || flag_mrecip_low_precision_sqrt));
 }
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 5cbd4cd..155d2bd 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -151,5 +151,5 @@ PC relative literal loads.
 
 mlow-precision-recip-sqrt
 Common Var(flag_mrecip_low_precision_sqrt) Optimization
-When calculating a sqrt approximation, run fewer steps.
+Calculate the reciprocal square-root approximation in fewer steps.
 This reduces precision, but can result in faster computation.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 490df93..eeff24d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12879,12 +12879,10 @@ corresponding flag to the linker.
 @item -mno-low-precision-recip-sqrt
 @opindex -mlow-precision-recip-sqrt
 @opindex -mno-low-precision-recip-sqrt
-The square root estimate uses two steps instead of three for double-precision,
-and one step instead of two for single-precision.
-Thus reducing latency and precision.
-This is only relevant if @option{-ffast-math} activates
-reciprocal square root estimate instructions.
-Which in turn depends on the target processor.
+The reciprocal square root approximation uses one step less than otherwise,
+thus reducing latency and precision.
+This is only relevant if @option{-ffast-math} enables the reciprocal square root
+approximation, which in turn depends on the target processor.
 
 @item -march=@var{name}
 @opindex march
-- 
2.6.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]