This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [AArch64][SVE] Utilize ASRD instruction for division and remainder


Thanks for the corrections, updated.

Regards
Yuliang

(no ChangeLog updates)

-----Original Message-----
From: Richard Sandiford <richard.sandiford@arm.com> 
Sent: 27 September 2019 11:20
To: Yuliang Wang <Yuliang.Wang@arm.com>
Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
Subject: Re: [AArch64][SVE] Utilize ASRD instruction for division and remainder

Yuliang Wang <Yuliang.Wang@arm.com> writes:
> +;; Unpredicated arithmetic right shift for division by power-of-2.
> +(define_expand "sdiv_pow2<mode>3"
> +  [(set (match_operand:SVE_I 0 "register_operand" "")
> +	(unspec:SVE_I
> +	  [(match_dup 3)
> +	   (unspec:SVE_I
> +	     [(match_operand:SVE_I 1 "register_operand" "")
> +	      (match_operand 2 "aarch64_simd_rshift_imm")]
> +	    UNSPEC_ASRD)]
> +	 UNSPEC_PRED_X))]
> +  "TARGET_SVE"
> +  {
> +    operands[3] = aarch64_ptrue_reg (<VPRED>mode);
> +  }
> +)

Sorry for not noticing last time, but: define_expands shouldn't have constraints, so it's better to drop the empty "" from the match_operands.

> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 
> 4ace224a8ff5ed4fafed10a69ef00ffb2d7d8c39..009b8f8db74c7a3bef996ceaba58
> 123f6558221c 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1446,6 +1446,10 @@ of bytes.
>  Target supports both signed and unsigned 
> multiply-high-with-round-and-scale
>  operations on vectors of half-words.
>  
> +@item vect_sdiv_pow2_si
> +Target supports signed division by constant power-of-2 operations on 
> +vectors of words.

"4 bytes" is more accurate than "words".  The problem with "word" is that it depends on context; e.g. although the AArch64 ISA uses "word"
to mean 32 bits, its words are really 64 bits as far as GCC is concerned.
(It's also worth noting that "4 bytes" isn't necessarily 32 bits, since GCC supports 16-bit and 32-bit bytes.)

All a biit pedantic, sorry.

> +/* { dg-final { scan-assembler-not {\tand\t%} } } */
> +

Stray newline at end of file.

> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 
> 414bf80003b9192806f79afed9393f9ef4750a7d..dedee87ced3d4cbed18fe4144282
> e5b4330a113d 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -6192,6 +6192,14 @@ proc check_effective_target_vect_mulhrs_hi {} {
>  		   && [check_effective_target_aarch64_sve2] }]  }
>  
> +# Return 1 if the target plus current options supports signed 
> +division # by power-of-2 operations on vectors of half-words.

"words" rather than "half-words", although as above 4 bytes is more accurate.

> +
> +proc check_effective_target_vect_sdiv_pow2_si {} {
> +    return [expr { [istarget aarch64*-*-*]
> +		   && [check_effective_target_aarch64_sve] }] }
> +
>  # Return 1 if the target plus current options supports a vector  # 
> demotion (packing) of shorts (to chars) and ints (to shorts)  # using 
> modulo arithmetic, 0 otherwise.
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 
> 2f86f9e4fc7039add1b1d7b82574cb8262eb4ba4..f09e9d54701ebee0382742d20d4f
> 5a0db84110de 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -2925,6 +2925,38 @@ vect_recog_divmod_pattern (stmt_vec_info stmt_vinfo, tree *type_out)
>        /* Pattern detected.  */
>        vect_pattern_detected ("vect_recog_divmod_pattern", last_stmt);
>  
> +      *type_out = vectype;
> +
> +      /* Check if the target supports this internal function.  */
> +      internal_fn ifn = IFN_DIV_POW2;
> +      if (direct_internal_fn_supported_p (ifn, vectype, OPTIMIZE_FOR_SPEED))
> +	{
> +	  tree shift = build_int_cst (itype, tree_log2 (oprnd1));
> +
> +	  tree var_div = vect_recog_temp_ssa_var (itype, NULL);
> +	  gimple *div_stmt = gimple_build_call_internal (ifn, 2, oprnd0, shift);
> +	  gimple_call_set_lhs (div_stmt, var_div);
> +
> +	  if (rhs_code == TRUNC_MOD_EXPR)
> +	    {
> +	      append_pattern_def_seq (stmt_vinfo, div_stmt);
> +	      def_stmt
> +		= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
> +				       LSHIFT_EXPR, var_div, shift);
> +	      append_pattern_def_seq (stmt_vinfo, def_stmt);
> +	      pattern_stmt
> +		= gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL),
> +				       MINUS_EXPR, oprnd0,
> +				       gimple_assign_lhs (def_stmt));
> +	    }
> +	  else
> +	    {
> +	      pattern_stmt = div_stmt;
> +	      gimple_set_location (pattern_stmt, gimple_location (last_stmt));
> +	    }

Again, sorry for not noticing last time, but we should do this outside the "else".  Although the TRUNC_MOD_EXPR is no longer being calculated as a modulus operation, the final MINUS_EXPR is still giving the same result.

> +          return pattern_stmt;

The indentation of this line uses 8 spaces instead of tabs.

OK otherwise, thanks.

Richard

Attachment: rb11863.patch
Description: rb11863.patch


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]