[GCC][PATCH][AArch64] Optimize x * copysign (1.0, y) [Patch (2/2)]

Mon Jul 24 11:22:00 GMT 2017

On Mon, Jul 10, 2017 at 04:49:13PM +0100, Tamar Christina wrote:
> Hi All,
> 
> As the mid-end patch has been respun  I've had to respin this one as well.
> So this is a new version and a ping as well.
> 
> The patch provides AArch64 optabs for XORSIGN, both vectorized and scalar.
> 
> This patch is a revival of a previous patch
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00069.html
> 
> Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues.
> Regression done on aarch64-none-linux-gnu and no regressions.
> 
> AArch64 now generates in GCC:
> 
>         movi    v2.2s, 0x80, lsl 24
>         and     v1.8b, v1.8b, v2.8b
>         eor     v0.8b, v0.8b, v1.8b
> 
> as opposed to before:
> 
>         fmov    s2, 1.0e+0
>         mov     x0, 2147483648
>         fmov    d3, x0
>         bsl     v3.8b, v1.8b, v2.8b
>         fmul    s0, s0, s3
> 
> Ok for trunk?

I have a question in-line below, and your ChangeLog is out of date, but
otherwise this looks good to me when the prerequisite makes it through
review.

> 
> gcc/
> 2017-07-10  Tamar Christina  <tamar.christina@arm.com>
> 
>         PR middle-end/19706
>         * config/aarch64/aarch64.md (xorsign<mode>3): New optabs.

>         * config/aarch64/aarch64-builtins.c
>         (aarch64_builtin_vectorized_function): Added CASE_CFN_XORSIGN.
>         * config/aarch64/aarch64-simd-builtins.def: Added xorsign BINOP.

These changes are no longer in the patch?

>         * config/aarch64/aarch64-simd.md: Added xorsign<mode>3.
> 
> gcc/testsuite/
> 2017-07-10  Tamar Christina  <tamar.christina@arm.com>
> 
>         * gcc.target/aarch64/xorsign.c: New.
>         * gcc.target/aarch64/xorsign_exec.c: New.
>         * gcc.target/aarch64/vect-xorsign_exec.c: New.
> ________________________________________
> From: gcc-patches-owner@gcc.gnu.org <gcc-patches-owner@gcc.gnu.org> on behalf of Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, June 12, 2017 8:56:58 AM
> To: GCC Patches
> Cc: nd; James Greenhalgh; Richard Earnshaw; Marcus Shawcroft
> Subject: [GCC][PATCH][AArch64] Optimize x * copysign (1.0, y) [Patch (2/2)]

Please don't top-post your replies like this, it makes it very confusing
to read the thread.

<snip old email>

> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 1cb6eeb318716aadacb84a44aa2062d486e0186b..db6a882eb42819569a127bc4526d73e94771c970 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -351,6 +351,35 @@
>    }
>  )
>  
> +(define_expand "xorsign<mode>3"
> +  [(match_operand:VHSDF 0 "register_operand")
> +   (match_operand:VHSDF 1 "register_operand")
> +   (match_operand:VHSDF 2 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +
> +  machine_mode imode = <V_cmp_result>mode;
> +  rtx v_bitmask = gen_reg_rtx (imode);
> +  rtx op1x = gen_reg_rtx (imode);
> +  rtx op2x = gen_reg_rtx (imode);
> +
> +  rtx arg1 = lowpart_subreg (imode, operands[1], <MODE>mode);
> +  rtx arg2 = lowpart_subreg (imode, operands[2], <MODE>mode);
> +
> +  int bits = GET_MODE_UNIT_BITSIZE (<MODE>mode) - 1;
> +
> +  emit_move_insn (v_bitmask,
> +		  aarch64_simd_gen_const_vector_dup (<V_cmp_result>mode,
> +						     HOST_WIDE_INT_M1U << bits));
> +
> +  emit_insn (gen_and<v_cmp_result>3 (op2x, v_bitmask, arg2));
> +  emit_insn (gen_xor<v_cmp_result>3 (op1x, arg1, op2x));
> +  emit_move_insn (operands[0],
> +		  lowpart_subreg (<MODE>mode, op1x, imode));
> +  DONE;
> +}
> +)
> +
>  (define_expand "copysign<mode>3"
>    [(match_operand:VHSDF 0 "register_operand")
>     (match_operand:VHSDF 1 "register_operand")
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 6bdbf650d9281f95fc7fa49b38e1a6da538cdd27..583bb2af4026bec68ecd129988b9aee6918b814c 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5000,6 +5000,42 @@
>  }
>  )
>  
> +;; For xorsign (x, y), we want to generate:
> +;;
> +;; LDR   d2, #1<<63
> +;; AND   v3.8B, v1.8B, v2.8B
> +;; EOR   v0.8B, v0.8B, v3.8B
> +;;
> +
> +(define_expand "xorsign<mode>3"
> +  [(match_operand:GPF 0 "register_operand")
> +   (match_operand:GPF 1 "register_operand")
> +   (match_operand:GPF 2 "register_operand")]
> +  "TARGET_FLOAT && TARGET_SIMD"
> +{
> +
> +  machine_mode imode = <V_cmp_result>mode;
> +  rtx mask = gen_reg_rtx (imode);
> +  rtx op1x = gen_reg_rtx (imode);
> +  rtx op2x = gen_reg_rtx (imode);
> +
> +  int bits = GET_MODE_BITSIZE (<MODE>mode) - 1;
> +  emit_move_insn (mask, GEN_INT (trunc_int_for_mode (HOST_WIDE_INT_M1U << bits,
> +						     imode)));

If you need a trunc_int_for_mode here, why don't you also need it in
the vector version above?

> +  emit_insn (gen_and<v_cmp_result>3 (op2x, mask,
> +				     lowpart_subreg (imode, operands[2],
> +						     <MODE>mode)));
> +  emit_insn (gen_xor<v_cmp_result>3 (op1x,
> +				     lowpart_subreg (imode, operands[1],
> +						     <MODE>mode),
> +				     op2x));
> +  emit_move_insn (operands[0],
> +		  lowpart_subreg (<MODE>mode, op1x, imode));
> +  DONE;
> +}
> +)
> +
>  ;; -------------------------------------------------------------------
>  ;; Reload support
>  ;; -------------------------------------------------------------------

Thanks,
James