This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions

From: Richard Sandiford <richard dot sandiford at arm dot com>
To: Yuliang Wang <Yuliang dot Wang at arm dot com>
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches at gcc dot gnu dot org>, nd <nd at arm dot com>
Date: Fri, 30 Aug 2019 12:49:23 +0100
Subject: Re: [PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions
References: <AM0PR08MB3716EE04EE4E4CF5E228C0F49BA20@AM0PR08MB3716.eurprd08.prod.outlook.com>
Thanks for doing this.  The patch looks good, so this review is mostly a
list of very minor formatting comments, sorry.

Yuliang Wang <Yuliang.Wang@arm.com> writes:
> 2019-08-22  Yuliang Wang  <yuliang.wang@arm.com>
>

Please add a line here pointing at the PR:

	PR tree-optimization/89386

The commit hooks pick this up automatically and link the commit to the
bugzilla ticket.  (The PR was filed for SSSE3, but the target-independent
bits are still needed there.)

>         * config/aarch64/aarch64-sve2.md: support for SVE2 instructions [S/U]MULL[T/B] + [R]SHRN[T/B] and MULHRS pattern variants

Unfortunately the changelog format is pretty strict here.  Lines have
to be 80 chars or shorter, indented by tabs, and each pattern, function,
variable or type needs to be listed individually regardless of how
useful that seems.  So I think this should be something like:

	* config/aarch64/aarch64-sve2.md (<su>mull<bt><Vwide>)
	(<r>shrnb<mode>, <r>shrnt<mode>, <su>mulh<r>s<mode>3): New patterns.

(See below for why the "*" patterns aren't listed.)

>         * config/aarch64/iterators.md: iterators and attributes for above

Here too the iterators need to be listed:

	* config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT)
	(UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT)
	(UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS)
	UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs.
	(MULLBT, SHRNB, SHRNT, MULHRS): New int iterators.
	(su, r): Handle the unspecs above.
	(bt): New int attribute.

>         * internal-fn.def: internal functions for MULH[R]S patterns

	* internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions.

>         * optabs.def: optabs definitions for above and sign variants

	* optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab)
	(umulhrs_optab): New optabs.

>         * tree-vect-patterns.c (vect_recog_multhi_pattern): pattern recognition function for MULHRS

	* tree-vect-patterns.c (vect_recog_multhi_pattern): New function.
	(vect_vect_recog_func_ptrs): Add it.

>         * gcc.target/aarch64/sve2/mulhrs_1.c: new test for all variants

Just:

	* gcc.target/aarch64/sve2/mulhrs_1.c: New test.

(Sorry that this is so finicky.  I'm just the messenger. :-))

> diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
> index 2334e5a7b7dc524bbd1f4d0a48ba5cd991970118..51783604ad8f83eb1d070c133009ed41a2a0252d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -63,3 +63,89 @@
>     movprfx\t%0, %2\;<sur>h<addsub>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
>    [(set_attr "movprfx" "*,yes")]
>  )
> +
> +;; Multiply long top / bottom

Very minor, but: GCC comments traditionally end with "." even if they're
not full sentences.

> +(define_insn "*<su>mull<bt><Vwide>"
> +  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
> +    (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand" "w")
> +			  (match_operand:SVE_BHSI 2 "register_operand" "w")]
> +			 MULLBT))]
> +  "TARGET_SVE2"
> +  "<su>mull<bt>\t%0.<Vewtype>, %1.<Vetype>, %2.<Vetype>"
> +)
> +
> +(define_expand "<su>mull<bt><Vwide>"
> +  [(set (match_operand:<VWIDE> 0 "register_operand")
> +    (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand")
> +			  (match_operand:SVE_BHSI 2 "register_operand")]
> +			 MULLBT))]
> +  "TARGET_SVE2"
> +  ""
> +)

It's more usual to put the define_expand first.  But in this case we
don't need separate define_expands, we can just make the define_insn the
named pattern itself:

;; Multiply long top / bottom.
(define_insn "<su>mull<bt><Vwide>"
  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
    (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand" "w")
		     (match_operand:SVE_BHSI 2 "register_operand" "w")]
		    MULLBT))]
  "TARGET_SVE2"
  "<su>mull<bt>\t%0.<Vewtype>, %1.<Vetype>, %2.<Vetype>"
)

> +
> +;; (Rounding) Right shift narrow bottom
> +(define_insn "*<r>shrnb<mode>"
> +  [(set (match_operand:SVE_BHSI 0 "register_operand" "=w")
> +	  (unspec:SVE_BHSI [(match_operand:<VWIDE> 1 "register_operand" "w")
> +			  (match_operand 2 "aarch64_simd_shift_imm_offset_<Vel>" "i")]

This is more a personal thing, but I think it's better to leave out
the "i" (just remove it rather than replace it with "") when the operand
doesn't in fact accept all the things that "i" allows.

> +			 SHRNB))]
> +  "TARGET_SVE2"
> +  "<r>shrnb\t%0.<Vetype>, %1.<Vewtype>, #%2"
> +)
> +
> +(define_expand "<r>shrnb<mode>"
> +  [(set (match_operand:SVE_BHSI 0 "register_operand")
> +	 (unspec:SVE_BHSI [(match_operand:<VWIDE> 1 "register_operand")
> +			  (match_operand 2 "aarch64_simd_shift_imm_offset_<Vel>")]
> +			 SHRNB))]
> +  "TARGET_SVE2"
> +  ""
> +)
> +
> +;; (Rounding) Right shift narrow top
> +(define_insn "*<r>shrnt<mode>"
> +  [(set (match_operand:SVE_BHSI 0 "register_operand" "=w")
> +	  (unspec:SVE_BHSI [(match_operand:SVE_BHSI 1 "register_operand" "%0")

"%" indicates that operand 1 is commutative with operand 2, which isn't
the case here.

> +	      (match_operand:<VWIDE> 2 "register_operand" "w")
> +			  (match_operand 3 "aarch64_simd_shift_imm_offset_<Vel>" "i")]

Same comment about "i" here.

> +			 SHRNT))]
> +  "TARGET_SVE2"
> +  "<r>shrnt\t%0.<Vetype>, %2.<Vewtype>, #%3"
> +)
> +
> +(define_expand "<r>shrnt<mode>"
> +  [(set (match_operand:SVE_BHSI 0 "register_operand")
> +	  (unspec:SVE_BHSI [(match_operand:<VWIDE> 1 "register_operand")
> +	      (match_operand:<VWIDE> 2 "register_operand")
> +			  (match_operand 3 "aarch64_simd_shift_imm_offset_<Vel>")]
> +			 SHRNT))]
> +  "TARGET_SVE2"
> +  ""
> +)

Each of the above two define_expands can also be removed and the "*"
removed from the define_insn.

> +;; Unpredicated integer multiply-high-with-(round-and-)scale
> +(define_expand "<su>mulh<r>s<mode>3"
> +  [(set (match_operand:SVE_BHSI 0 "register_operand")
> +	(unspec:SVE_BHSI
> +	  [(match_dup 3)
> +	   (unspec:SVE_BHSI [(match_operand:SVE_BHSI 1 "register_operand")
> +			  (match_operand:SVE_BHSI 2 "register_operand")]
> +			 MULHRS)]
> +	  UNSPEC_PRED_X))]
> +  "TARGET_SVE2"
> +  {
> +    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));

These days this should be:

    operands[3] = aarch64_ptrue_reg (<VPRED>mode);

There's been a bit of churn around this, sorry.

> @@ -1585,6 +1597,13 @@
>  
>  (define_int_iterator RHADD [UNSPEC_SRHADD UNSPEC_URHADD])
>  
> +(define_int_iterator MULLBT [UNSPEC_SMULLB UNSPEC_UMULLB
> +            UNSPEC_SMULLT UNSPEC_UMULLT])

Minor formatting nit: should be indented one column beyond the "[".

> +
> +(define_int_iterator SHRNB [UNSPEC_SHRNB UNSPEC_RSHRNB])
> +
> +(define_int_iterator SHRNT [UNSPEC_SHRNT UNSPEC_RSHRNT])
> +
>  (define_int_iterator DOTPROD [UNSPEC_SDOT UNSPEC_UDOT])
>  
>  (define_int_iterator ADDSUBHN [UNSPEC_ADDHN UNSPEC_RADDHN
> @@ -1604,6 +1623,9 @@
>  
>  (define_int_iterator VQDMULH [UNSPEC_SQDMULH UNSPEC_SQRDMULH])
>  
> +(define_int_iterator MULHRS [UNSPEC_SMULHS UNSPEC_UMULHS
> +            UNSPEC_SMULHRS UNSPEC_UMULHRS])
> +

Same here.

>  (define_int_iterator USSUQADD [UNSPEC_SUQADD UNSPEC_USQADD])
>  
>  (define_int_iterator SUQMOVN [UNSPEC_SQXTN UNSPEC_UQXTN])
> @@ -1866,7 +1888,11 @@
>  		     (UNSPEC_COND_FCVTZS "s")
>  		     (UNSPEC_COND_FCVTZU "u")
>  		     (UNSPEC_COND_SCVTF "s")
> -		     (UNSPEC_COND_UCVTF "u")])
> +		     (UNSPEC_COND_UCVTF "u")
> +	       (UNSPEC_SMULLB "s") (UNSPEC_UMULLB "u")
> +	       (UNSPEC_SMULLT "s") (UNSPEC_UMULLT "u")
> +	       (UNSPEC_SMULHS "s") (UNSPEC_UMULHS "u")
> +	       (UNSPEC_SMULHRS "s") (UNSPEC_UMULHRS "u")])

These should line up with the existing entries.

>  
>  (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
>  		      (UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur")
> @@ -1904,6 +1930,10 @@
>                      (UNSPEC_SQRSHRN "r") (UNSPEC_UQRSHRN "r")
>                      (UNSPEC_SQSHL   "")  (UNSPEC_UQSHL  "")
>                      (UNSPEC_SQRSHL   "r")(UNSPEC_UQRSHL  "r")
> +		    (UNSPEC_SHRNB "") (UNSPEC_SHRNT "")
> +		    (UNSPEC_RSHRNB "r") (UNSPEC_RSHRNT "r")
> +	       (UNSPEC_SMULHS "") (UNSPEC_UMULHS "")
> +	       (UNSPEC_SMULHRS "r") (UNSPEC_UMULHRS "r")

Same here.

>  ])
>  
>  (define_int_attr lr [(UNSPEC_SSLI  "l") (UNSPEC_USLI  "l")
> @@ -1916,6 +1946,9 @@
>  		    (UNSPEC_SHADD "") (UNSPEC_UHADD "u")
>  		    (UNSPEC_SRHADD "") (UNSPEC_URHADD "u")])
>  
> +(define_int_attr bt [(UNSPEC_SMULLB "b") (UNSPEC_UMULLB "b")
> +		    (UNSPEC_SMULLT "t") (UNSPEC_UMULLT "t")])

This line should be indented one column further.

> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 5283e6753f26eb0da08a393949128d09643e50dd..adcd9477b4c4b2e73cad7c72ee51336e98e81954 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -342,6 +342,10 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> +OPTAB_D (smulhs_optab, "smulhs$a3")
> +OPTAB_D (smulhrs_optab, "smulhrs$a3")
> +OPTAB_D (umulhs_optab, "umulhs$a3")
> +OPTAB_D (umulhrs_optab, "umulhrs$a3")
>  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
>  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")

The new patterns need to be documented in doc/md.texi.

> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/mulhrs_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/mulhrs_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..98cdc73d5256d712c985c541c1a585d081e5f5ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/mulhrs_1.c
> @@ -0,0 +1,123 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details --save-temps" } */
> +
> +#include <stdint.h>
> +
> +#define MULTHI(TYPE, BIGGER, RND)                     \
> +TYPE __attribute__ ((noinline, noclone))              \
> +mulhs_##TYPE##_##RND (TYPE *restrict x,               \
> +        TYPE *restrict y, TYPE *restrict z, int n)    \
> +{                                                     \
> +  for (int i = 0; i < n; i++)                         \
> +  {                                                   \
> +    z[i] = ((((BIGGER)x[i] * (BIGGER)y[i]) >>         \
> +            (sizeof(BIGGER)/2-(RND+1)) + RND) >> 1;   \

Missing ")" here.

> +  }                                                   \
> +}
> +
> +MULTHI (int8_t, int16_t, 0)
> +MULTHI (int16_t, int32_t, 0)
> +MULTHI (int32_t, int64_t, 0)
> +
> +MULTHI (uint8_t, uint16_t, 0)
> +MULTHI (uint16_t, uint32_t, 0)
> +MULTHI (uint32_t, uint64_t, 0)
> +
> +MULTHI (int8_t, int16_t, 1)
> +MULTHI (int16_t, int32_t, 1)
> +MULTHI (int32_t, int64_t, 1)
> +
> +MULTHI (uint8_t, uint16_t, 1)
> +MULTHI (uint16_t, uint32_t, 1)
> +MULTHI (uint32_t, uint64_t, 1)
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 12 "vect" } } */
> +
> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tsmullt\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tshrnb\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +                  \tshrnt\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +              } 
> +     1 } } */

Unfortunately, this kind of multi-line scan isn't supported in this
form: the dg-final has to be on a single line.  (Also unfortunately,
you don't get an error about this; the line is simply ignored.)

With scheduling enabled, only the order of the shrnb and shrnt is
guaranteed, so I think we should have separate scan-assembler-times
for each line.

> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tsmullt\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tshrnb\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +                  \tshrnt\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tsmullt\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tshrnb\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +                  \tshrnt\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +              } 
> +     1 } } */
> +     
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tumullt\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tshrnb\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +                  \tshrnt\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tumullt\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tshrnb\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +                  \tshrnt\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tumullt\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tshrnb\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +                  \tshrnt\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +              } 
> +     1 } } */
> +     
> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tsmullt\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \trshrnb\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +                  \trshrnt\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tsmullt\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \trshrnb\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +                  \trshrnt\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tsmullt\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \trshrnb\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +                  \trshrnt\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +              } 
> +     1 } } */
> +     
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tumullt\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \trshrnb\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +                  \trshrnt\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tumullt\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \trshrnb\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +                  \trshrnt\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tumullt\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \trshrnb\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +                  \trshrnt\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +              } 
> +     1 } } */
> +     
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index ccb2e1edecda09db786c0d98ccd25f5be107c7e4..96e86f716da0b819640cca2ff5d01c3ce8b342e3 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -1723,6 +1723,169 @@ vect_recog_over_widening_pattern (stmt_vec_info last_stmt_info, tree *type_out)
>    return pattern_stmt;
>  }
>  
> +/* Recognize the following patterns:
> +
> +	    ATYPE a;  // narrower than TYPE
> +	    BTYPE b;  // narrower than TYPE
> +
> +	    // multiply high with scaling
> +	    1) TYPE res = ((TYPE) a * (TYPE) b) >> c;
> +	    // or also with rounding
> +	    2) TYPE res = (((TYPE) a * (TYPE) b) >> d + 1) >> 1;
> +
> +	where only the bottom half of res is used.  */

Should be indented three spaces only.

> +
> +static gimple *
> +vect_recog_multhi_pattern (stmt_vec_info last_stmt_info, tree *type_out)

Maybe "vect_recog_mulhs_pattern", for consistency with the IFN names?

> +{
> +	vec_info *vinfo;

Might as well delay declaring this until...

> +
> +  /* Check for a right shift.  */
> +  gassign *last_stmt = dyn_cast <gassign *> (last_stmt_info->stmt);
> +  if (!last_stmt
> +      || gimple_assign_rhs_code (last_stmt) != RSHIFT_EXPR)
> +    return NULL;
> +  vinfo = last_stmt_info->vinfo;

...here.  (A lot of existing GCC code predates even C99 and so doesn't
do this, but it's more usual for newer code.)

> +
> +  /* Check that the shift result is wider than the users of the
> +     result need (i.e. that narrowing would be a natural choice).  */
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +  tree lhs_type = TREE_TYPE (lhs);
> +  unsigned int target_precision =
> +    vect_element_precision (last_stmt_info->min_output_precision);

GCC formatting puts operators at the start rather than the end of a line:

  unsigned int target_precision
    = vect_element_precision (last_stmt_info->min_output_precision);

Same for the rest of the function.

> +  if (!INTEGRAL_TYPE_P (lhs_type)
> +      || target_precision >= TYPE_PRECISION (lhs_type))
> +    return NULL;
> +
> +  /* Look through any change in sign on the outer shift input.  */
> +  vect_unpromoted_value unprom_input;
> +  tree rhs_rshift_inp = vect_look_through_possible_promotion (vinfo,
> +          gimple_assign_rhs1 (last_stmt), &unprom_input);

Formatting:

  tree rhs_rshift_inp = vect_look_through_possible_promotion
    (vinfo, gimple_assign_rhs1 (last_stmt), &unprom_input);

> +  if (!rhs_rshift_inp
> +      || TYPE_PRECISION (TREE_TYPE (rhs_rshift_inp))
> +         != TYPE_PRECISION (lhs_type))
> +    return NULL;
> +
> +  /* Get the definition of the shift input.  */
> +  stmt_vec_info input_stmt_info =
> +      vect_get_internal_def (vinfo, rhs_rshift_inp);
> +  if (!input_stmt_info)
> +    return NULL;
> +  gassign *input_stmt = dyn_cast <gassign *> (input_stmt_info->stmt);
> +  if (!input_stmt)
> +    return NULL;
> +
> +  stmt_vec_info mulhs_stmt_info;
> +  internal_fn ifn;
> +  unsigned int expect_offset;
> +
> +  /* Check for presence of rounding term.  */
> +  if (gimple_assign_rhs_code (input_stmt) == PLUS_EXPR)
> +  {
> +    /* Check that the outer shift was by 1.  */
> +    if (!integer_onep (gimple_assign_rhs2 (last_stmt)))
> +      return NULL;

Brace should be indented two columns beyond the "if", giving:

  if (gimple_assign_rhs_code (input_stmt) == PLUS_EXPR)
    {
      /* Check that the outer shift was by 1.  */
      if (!integer_onep (gimple_assign_rhs2 (last_stmt)))
	return NULL;

etc.

> +
> +    /* Check that one of the operands is 1.  */
> +    tree oprnds[2] = { gimple_assign_rhs1 (input_stmt),
> +                       gimple_assign_rhs2 (input_stmt) };
> +    bool isone[2] = { integer_onep (oprnds[0]),
> +                      integer_onep (oprnds[1]) };
> +    if (!(isone[0] ^ isone[1]))
> +      return NULL;

In a PLUS_EXPR, any constant operand has to come second, so this only
needs to check integer_onep (oprnds[1]).

> +    tree mulhs = oprnds[(int)(isone[0])];
> +
> +    mulhs_stmt_info = vect_get_internal_def (vinfo, mulhs);
> +    if (!mulhs_stmt_info)
> +      return NULL;
> +
> +    expect_offset = target_precision + 2;
> +    ifn = IFN_MULHRS;
> +  }
> +  else
> +  {
> +    mulhs_stmt_info = last_stmt_info;
> +
> +    expect_offset = target_precision + 1;
> +    ifn = IFN_MULHS;
> +  }
> +
> +  /* Get the definition of the multiply-high-scale part.  */
> +  gassign *mulhs_stmt = dyn_cast <gassign *> (mulhs_stmt_info->stmt);
> +  if (!mulhs_stmt
> +      || gimple_assign_rhs_code (mulhs_stmt) != RSHIFT_EXPR)
> +    return NULL;

I think this should go in the PLUS_EXPR arm, since we've already
checked it for the "else" path.

> +
> +  /* Get the scaling factor.  */
> +  tree rhs_scale = gimple_assign_rhs2 (mulhs_stmt);
> +  tree rhs_scale_type = TREE_TYPE (rhs_scale);
> +  if (TREE_CODE (rhs_scale) != INTEGER_CST
> +      || TREE_CODE (rhs_scale_type) != INTEGER_TYPE
> +      || !type_has_mode_precision_p (rhs_scale_type))
> +    return NULL;

Just:

  /* Get the scaling factor.  */
  tree rhs_scale = gimple_assign_rhs2 (mulhs_stmt);
  if (TREE_CODE (rhs_scale) != INTEGER_CST)
    return NULL;

would be enough.

> +  /* Get the definition of the scaling input term.  */
> +  tree rhs_mulh = gimple_assign_rhs1 (mulhs_stmt);
> +  tree rhs_mulh_type = TREE_TYPE (rhs_mulh);
> +  if (!INTEGRAL_TYPE_P (rhs_mulh_type))
> +    return NULL;

This is guaranteed to be true, no need to check.

> +  stmt_vec_info mulh_stmt_info = vect_get_internal_def (vinfo, rhs_mulh);
> +  if (!mulh_stmt_info)
> +    return NULL;
> +
> +  /* Check that the shift represents the correct scaling.  */
> +  if (wi::ne_p(wi::to_widest(rhs_scale) + expect_offset,
> +          TYPE_PRECISION (rhs_mulh_type)))
> +    return NULL;

At this point it's guaranteed that:

  TYPE_PRECISION (rhs_mulh_type) == TYPE_PRECISION (lhs_type)

I think it would be better use lhs_type for the condition and check the
value as part of the "TREE_CODE (rhs_scale) != INTEGER_CST" condition
above.  You can use != instead of wi::ne_p.

> +
> +  /* Check whether the scaling input term can be seen as two widened
> +     inputs multiplied together.  */
> +  vect_unpromoted_value unprom_mult[2];
> +  tree new_type;
> +  unsigned int nops = vect_widened_op_tree (mulh_stmt_info, MULT_EXPR,
> +					    WIDEN_MULT_EXPR, false, 2,
> +					    unprom_mult, &new_type);
> +  if (nops != 2)
> +    return NULL;
> +
> +  vect_pattern_detected ("vect_recog_multhi_pattern", last_stmt);
> +
> +  /* Adjust output precision.  */
> +  if (TYPE_PRECISION (new_type) < target_precision)
> +    new_type = build_nonstandard_integer_type (target_precision,
> +					       TYPE_UNSIGNED (new_type));
> +
> +  /* Check for target support.  */
> +  tree new_vectype = get_vectype_for_scalar_type (new_type);
> +  if (!new_vectype
> +      || !direct_internal_fn_supported_p (ifn, new_vectype,
> +					  OPTIMIZE_FOR_SPEED))
> +    return NULL;
> +
> +  /* The IR requires a valid vector type for the cast result, even though
> +     it's likely to be discarded.  */
> +  *type_out = get_vectype_for_scalar_type (lhs_type);
> +  if (!*type_out)
> +    return NULL;
> +
> +  /* Generate the IFN_MULHRS call.  */
> +  tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> +  tree new_ops[2];
> +  vect_convert_inputs (last_stmt_info, 2, new_ops, new_type,
> +          unprom_mult, new_vectype);

Arguments should be indented one character beyond the "(".

> +  gcall *mulhrs_stmt = gimple_build_call_internal (ifn, 2,
> +          new_ops[0], new_ops[1]);

Same here.

> +  gimple_call_set_lhs (mulhrs_stmt, new_var);
> +  gimple_set_location (mulhrs_stmt, gimple_location (last_stmt));
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "created pattern stmt: %G", mulhrs_stmt);
> +
> +  return vect_convert_output (last_stmt_info, lhs_type,
> +          mulhrs_stmt, new_vectype);

Here too.

It would be good to have some tests in gcc.dg/vect too, along the
lines of gcc.dg/vect/vect-avg-[1-4].c.

For the record, I think in principle we could look through other sign
conversions too.  E.g. we might have (psuedo-gimple):

  int _2 = _1 >> 14;
  unsigned int _3 = (unsigned int) _2;
  unsigned int _4 = _3 + 1;
  int _5 = (int) _4;
  int _6 = _5 >> 1;

This guarantees that overflow in the addition of _4 is well-defined.
Or we might have a sign change on the input to the shift.  But I think
those cases should be treated as potential follow-on improvements if
we see a case that needs them (probably via a new helper to make looking
through sign conversions a bit easier).  It doesn't need to be part of
the initial patch.

Thanks,
Richard
References:
- [PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions
  - From: Yuliang Wang
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]