This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [AArch64] Emit square root using the Newton series

From: Evandro Menezes <e dot menezes at samsung dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
Date: Mon, 14 Mar 2016 11:39:20 -0500
Subject: Re: [AArch64] Emit square root using the Newton series
Authentication-results: sourceware.org; auth=none
References: <AM3PR08MB00886499882773F3C8B9F71D83B30 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <011d01d17a26$31b3ade0$951b09a0$ at samsung dot com> <AM3PR08MB0088D558E387C1B736785AA883B40 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56E1A7AD dot 90408 at samsung dot com> <AM3PR08MB00886A32AE872304290F0B1483B40 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com> <56E1F1FB dot 6070405 at samsung dot com> <AM3PR08MB0088E16F5745DA0E7D463D2183B50 at AM3PR08MB0088 dot eurprd08 dot prod dot outlook dot com>

On 03/10/16 19:06, Wilco Dijkstra wrote:

Evandro Menezes <e.menezes@samsung.com> wrote:

That's what I had in mind too, but around the approximation for x^-1/2
and using masks for vector cases thusly:

        fcmne   v3.4s, v0.4s, #0.0
         frsqrte v1.4s, v0.4s
         fmul    v2.4s, v1.4s, v1.4s
         frsqrts v2.4s, v0.4s, v2.4s
         fmul    v1.4s, v1.4s, v2.4s
         fmul    v2.4s, v1.4s, v1.4s
         frsqrts v2.4s, v0.4s, v2.4s
         fmul    v1.4s, v1.4s, v2.4s
        and     v1.4s, v3.4s
         fmul    v0.4s, v1.4s, v0.4s

That's possible but the overall latency is higher - according to exynos-1.md the
above takes 44 cycles while my version would be 37.

I'm currently working to get this prototyped without modifying thereciprocal square root. Once I'm done, I'll merge both functionstogether to generate better code.

I got the scalar version going, but I'm stuck with the vector version.As you can see above, I need to use the complement of the mask producedby FCMEQ to squelch the offending vector element. However, the way inwhich FCMEQ is defined in GCC, it produces an integer vector and theSIMD AND only takes integer vectors. I'm stuck at how to pass an FPvector to AND and then its integer vector back to an FP insn.


Here's how the function stands at the moment:

   void
   aarch64_emit_approx_sqrt (rtx dst, rtx src)
   {
      machine_mode mode = GET_MODE (src);
      gcc_assert (GET_MODE_INNER (mode) == SFmode
                  || GET_MODE_INNER (mode) == DFmode);

      bool scalar = !VECTOR_MODE_P (mode);
      bool narrow = (mode == V2SFmode);

      rtx xsrc = gen_reg_rtx (mode);
      emit_move_insn (xsrc, src);

      rtx xcc, xne, xmsk;
      if (scalar)
        {
          /* fcmp */
          xcc = aarch64_gen_compare_reg (NE, xsrc, CONST0_RTX (mode));
          xne = gen_rtx_NE (VOIDmode, xcc, const0_rtx);
        }
      else
        {
          machine_mode mcmp = mode_for_vector (int_mode_for_mode
   (GET_MODE_INNER (mode)), GET_MODE_NUNITS (mode));
          /* fcmne */
          xmsk = gen_reg_rtx (mode);
          /* Just V4SF for now */
          emit_insn (gen_aarch64_cmeqv4sf (xmsk, xsrc, CONST0_RTX (mode)));
          /* TODO: must use the complement of the this result.  */
        }

      /* Calculate the approximate reciprocal square root.  */
      rtx xrsqrt = gen_reg_rtx (mode);
      aarch64_emit_approx_rsqrt (xrsqrt, xsrc);

      /* Calculate the approximate square root.  */
      rtx xsqrt = gen_reg_rtx (mode);
      emit_set_insn (xsqrt, gen_rtx_MULT (mode, xrsqrt, xsrc));

      /* Qualify the result for when the input is zero.  */
      rtx xdst = gen_reg_rtx (mode);
      if (scalar)
        /* fcsel */
        emit_set_insn (xdst, gen_rtx_IF_THEN_ELSE (mode, xne, xsqrt,
   xsrc));
      else
        /* and */
        emit_set_insn (xdst, gen_rtx_AND (mode, xsqrt, xmsk));

      emit_move_insn (dst, xdst);
   }

Any help is welcome.

Thank you,

--
Evandro Menezes

Follow-Ups:
- Re: [AArch64] Emit square root using the Newton series
  - From: Wilco Dijkstra

References:
- Re: [AArch64] Emit square root using the Newton series
  - From: Wilco Dijkstra
- Re: [AArch64] Emit square root using the Newton series
  - From: Evandro Menezes
- Re: [AArch64] Emit square root using the Newton series
  - From: Wilco Dijkstra
- Re: [AArch64] Emit square root using the Newton series
  - From: Evandro Menezes
- Re: [AArch64] Emit square root using the Newton series
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]