This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 3/3][AArch64] Emit division using the Newton series
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Evandro Menezes <e dot menezes at samsung dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, Andrew Pinski <pinskia at gmail dot com>, Philipp Tomsich <philipp dot tomsich at theobroma-systems dot com>, Benedikt Huber <benedikt dot huber at theobroma-systems dot com>, <nd at arm dot com>
- Date: Tue, 31 May 2016 10:27:55 +0100
- Subject: Re: [PATCH 3/3][AArch64] Emit division using the Newton series
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <57212C09 dot 1000501 at samsung dot com> <20160525161633 dot GA18183 at arm dot com> <5748D0DA dot 4070301 at samsung dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
On Fri, May 27, 2016 at 05:57:30PM -0500, Evandro Menezes wrote:
> On 05/25/16 11:16, James Greenhalgh wrote:
> >On Wed, Apr 27, 2016 at 04:15:53PM -0500, Evandro Menezes wrote:
> >> gcc/
> >> * config/aarch64/aarch64-protos.h
> >> (tune_params): Add new member "approx_div_modes".
> >> (aarch64_emit_approx_div): Declare new function.
> >> * config/aarch64/aarch64.c
> >> (generic_tunings): New member "approx_div_modes".
> >> (cortexa35_tunings): Likewise.
> >> (cortexa53_tunings): Likewise.
> >> (cortexa57_tunings): Likewise.
> >> (cortexa72_tunings): Likewise.
> >> (exynosm1_tunings): Likewise.
> >> (thunderx_tunings): Likewise.
> >> (xgene1_tunings): Likewise.
> >> (aarch64_emit_approx_div): Define new function.
> >> * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
> >> * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
> >> * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
> >> * doc/invoke.texi (-mlow-precision-div): Describe new option.
> >My comments from the other two patches around using a structure to
> >group up the tuning flags and whether we really want the new option
> >apply here too.
> >
> >This code has no consumers by default and is only used for
> >-mlow-precision-div. Is this option likely to be useful to our users in
> >practice? It might all be more palatable under something like the rs6000's
> >-mrecip=opt .
>
> I agree. OK as a follow up?
Works for me.
> >>diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> >>index 47ccb18..7e99e16 100644
> >>--- a/gcc/config/aarch64/aarch64-simd.md
> >>+++ b/gcc/config/aarch64/aarch64-simd.md
> >>@@ -1509,7 +1509,19 @@
> >> [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
> >> )
> >>-(define_insn "div<mode>3"
> >>+(define_expand "div<mode>3"
> >>+ [(set (match_operand:VDQF 0 "register_operand")
> >>+ (div:VDQF (match_operand:VDQF 1 "general_operand")
> >What does this relaxation to general_operand give you?
>
> Hold that thought...
>
> >>+ (match_operand:VDQF 2 "register_operand")))]
> >>+ "TARGET_SIMD"
> >>+{
> >>+ if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
> >>+ DONE;
> >>+
> >>+ operands[1] = force_reg (<MODE>mode, operands[1]);
> >...other than the need to do this (sorry if I've missed something obvious).
>
> Hold on...
>
> >>+ if (num != CONST1_RTX (mode))
> >>+ {
> >>+ /* Calculate the approximate division. */
> >>+ rtx xnum = force_reg (mode, num);
> >>+ emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> >>+ }
>
> About that relaxation, as you can see here, since the series
> approximates the reciprocal of the denominator, if the numerator is
> 1.0, a register can be spared, as the result is ready and the
> numerator is not needed.
But, in the case that the multiplication is by 1, can we not rely on the
other optimization machinery eliminating it? I mean, I see the optimization
that this enables for you, but can't you rely on future passes to do the
cleanup, and save yourself the few lines of special casing?
> +/* Emit the instruction sequence to compute the approximation for the division
> + of NUM by DEN and return whether the sequence was emitted or not. */
Needs a brief mention of what we use QUO for :).
> +
> +bool
> +aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
> +{
> + machine_mode mode = GET_MODE (quo);
> + bool use_approx_division_p = (flag_mlow_precision_div
> + || (aarch64_tune_params.approx_modes->division
> + & AARCH64_APPROX_MODE (mode)));
> +
> + if (!flag_finite_math_only
> + || flag_trapping_math
> + || !flag_unsafe_math_optimizations
> + || optimize_function_for_size_p (cfun)
> + || !use_approx_division_p)
> + return false;
> +
> + /* Estimate the approximate reciprocal. */
> + rtx xrcp = gen_reg_rtx (mode);
> + emit_insn ((*get_recpe_type (mode)) (xrcp, den));
> +
> + /* Iterate over the series twice for SF and thrice for DF. */
> + int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
> +
> + /* Optionally iterate over the series once less for faster performance,
> + while sacrificing the accuracy. */
> + if (flag_mlow_precision_div)
> + iterations--;
> +
> + /* Iterate over the series to calculate the approximate reciprocal. */
> + rtx xtmp = gen_reg_rtx (mode);
> + while (iterations--)
> + {
> + emit_insn ((*get_recps_type (mode)) (xtmp, xrcp, den));
> +
> + if (iterations > 0)
> + emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
> + }
> +
> + if (num != CONST1_RTX (mode))
> + {
> + /* As the approximate reciprocal of the denominator is already calculated,
> + only calculate the approximate division when the numerator is not 1.0. */
Long lines.
> + rtx xnum = force_reg (mode, num);
> + emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> + }
> +
> + /* Finalize the approximation. */
> + emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp));
> + return true;
> +}
> +
> /* Return the number of instructions that can be issued per cycle. */
> static int
> aarch64_sched_issue_rate (void)
Thanks,
James