This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [AArch64] Emit square root using the Newton series


On 03/10/16 19:06, Wilco Dijkstra wrote:
Evandro Menezes <e.menezes@samsung.com> wrote:
That's what I had in mind too, but around the approximation for x^-1/2
and using masks for vector cases thusly:

        fcmne   v3.4s, v0.4s, #0.0
         frsqrte v1.4s, v0.4s
         fmul    v2.4s, v1.4s, v1.4s
         frsqrts v2.4s, v0.4s, v2.4s
         fmul    v1.4s, v1.4s, v2.4s
         fmul    v2.4s, v1.4s, v1.4s
         frsqrts v2.4s, v0.4s, v2.4s
         fmul    v1.4s, v1.4s, v2.4s
        and     v1.4s, v3.4s
         fmul    v0.4s, v1.4s, v0.4s
That's possible but the overall latency is higher - according to exynos-1.md the
above takes 44 cycles while my version would be 37.

I'm currently working to get this prototyped without modifying the reciprocal square root. Once I'm done, I'll merge both functions together to generate better code.

I got the scalar version going, but I'm stuck with the vector version. As you can see above, I need to use the complement of the mask produced by FCMEQ to squelch the offending vector element. However, the way in which FCMEQ is defined in GCC, it produces an integer vector and the SIMD AND only takes integer vectors. I'm stuck at how to pass an FP vector to AND and then its integer vector back to an FP insn.

Here's how the function stands at the moment:

   void
   aarch64_emit_approx_sqrt (rtx dst, rtx src)
   {
      machine_mode mode = GET_MODE (src);
      gcc_assert (GET_MODE_INNER (mode) == SFmode
                  || GET_MODE_INNER (mode) == DFmode);

      bool scalar = !VECTOR_MODE_P (mode);
      bool narrow = (mode == V2SFmode);

      rtx xsrc = gen_reg_rtx (mode);
      emit_move_insn (xsrc, src);

      rtx xcc, xne, xmsk;
      if (scalar)
        {
          /* fcmp */
          xcc = aarch64_gen_compare_reg (NE, xsrc, CONST0_RTX (mode));
          xne = gen_rtx_NE (VOIDmode, xcc, const0_rtx);
        }
      else
        {
          machine_mode mcmp = mode_for_vector (int_mode_for_mode
   (GET_MODE_INNER (mode)), GET_MODE_NUNITS (mode));
          /* fcmne */
          xmsk = gen_reg_rtx (mode);
          /* Just V4SF for now */
          emit_insn (gen_aarch64_cmeqv4sf (xmsk, xsrc, CONST0_RTX (mode)));
          /* TODO: must use the complement of the this result.  */
        }

      /* Calculate the approximate reciprocal square root.  */
      rtx xrsqrt = gen_reg_rtx (mode);
      aarch64_emit_approx_rsqrt (xrsqrt, xsrc);

      /* Calculate the approximate square root.  */
      rtx xsqrt = gen_reg_rtx (mode);
      emit_set_insn (xsqrt, gen_rtx_MULT (mode, xrsqrt, xsrc));

      /* Qualify the result for when the input is zero.  */
      rtx xdst = gen_reg_rtx (mode);
      if (scalar)
        /* fcsel */
        emit_set_insn (xdst, gen_rtx_IF_THEN_ELSE (mode, xne, xsqrt,
   xsrc));
      else
        /* and */
        emit_set_insn (xdst, gen_rtx_AND (mode, xsqrt, xmsk));

      emit_move_insn (dst, xdst);
   }

Any help is welcome.

Thank you,

--
Evandro Menezes


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]