[PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions

Thu Sep 12 07:52:00 GMT 2019

Hi Richard,

Thanks for your comments and advice; I have applied the relevant changes.

Regards,
Yuliang

UPDATE:

Added new tests. Built and regression tested on aarch64-none-elf and aarch64-linux-gnu.

gcc/ChangeLog:

2019-09-1  Yuliang Wang  <yuliang.wang@arm.com>

	PR tree-optimization/89386

	* config/aarch64/aarch64-sve2.md (<su>mull<bt><Vwide>)
	(<r>shrnb<mode>, <r>shrnt<mode>): New SVE2 patterns.
	(<su>mulh<r>s<mode>3): New pattern for MULHRS.
	* config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT)
	(UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT)
	(UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS)
	UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs.
	(MULLBT, SHRNB, SHRNT, MULHRS): New int iterators.
	(su, r): Handle the unspecs above.
	(bt): New int attribute.
	* internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions.
	* internal-fn.c (first_commutative_argument): Commutativity info for above.
	* optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab, umulhrs_optab):
	New optabs.
	* doc/md.texi (smulhs$var{m3}, umulhs$var{m3})
	(smulhrs$var{m3}, umulhrs$var{m3}): Documentation for the above.
	* tree-vect-patterns.c (vect_recog_mulhs_pattern): New pattern function.
	(vect_vect_recog_func_ptrs): Add it.
	* testsuite/gcc.target/aarch64/sve2/mulhrs_1.c: New test.
	* testsuite/gcc.dg/vect/vect-mulhrs-1.c: As above.
	* testsuite/gcc.dg/vect/vect-mulhrs-2.c: As above.
	* testsuite/gcc.dg/vect/vect-mulhrs-3.c: As above.
	* testsuite/gcc.dg/vect/vect-mulhrs-4.c: As above.
	* doc/sourcebuild.texi (vect_mulhrs_hi): Document new target selector.
	* testsuite/lib/target-supports.exp (check_effective_target_vect_mulhrs_hi):
	Return true for AArch64 without SVE2.

-----Original Message-----
From: Richard Sandiford <richard.sandiford@arm.com> 
Sent: 30 August 2019 12:49
To: Yuliang Wang <Yuliang.Wang@arm.com>
Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
Subject: Re: [PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions

Thanks for doing this.  The patch looks good, so this review is mostly a list of very minor formatting comments, sorry.

Yuliang Wang <Yuliang.Wang@arm.com> writes:
> 2019-08-22  Yuliang Wang  <yuliang.wang@arm.com>
>

Please add a line here pointing at the PR:

	PR tree-optimization/89386

The commit hooks pick this up automatically and link the commit to the bugzilla ticket.  (The PR was filed for SSSE3, but the target-independent bits are still needed there.)

>         * config/aarch64/aarch64-sve2.md: support for SVE2 
> instructions [S/U]MULL[T/B] + [R]SHRN[T/B] and MULHRS pattern variants

Unfortunately the changelog format is pretty strict here.  Lines have to be 80 chars or shorter, indented by tabs, and each pattern, function, variable or type needs to be listed individually regardless of how useful that seems.  So I think this should be something like:

	* config/aarch64/aarch64-sve2.md (<su>mull<bt><Vwide>)
	(<r>shrnb<mode>, <r>shrnt<mode>, <su>mulh<r>s<mode>3): New patterns.

(See below for why the "*" patterns aren't listed.)

>         * config/aarch64/iterators.md: iterators and attributes for 
> above

Here too the iterators need to be listed:

	* config/aarch64/iterators.md (UNSPEC_SMULLB, UNSPEC_SMULLT)
	(UNSPEC_UMULLB, UNSPEC_UMULLT, UNSPEC_SHRNB, UNSPEC_SHRNT)
	(UNSPEC_RSHRNB, UNSPEC_RSHRNT, UNSPEC_SMULHS, UNSPEC_SMULHRS)
	UNSPEC_UMULHS, UNSPEC_UMULHRS): New unspecs.
	(MULLBT, SHRNB, SHRNT, MULHRS): New int iterators.
	(su, r): Handle the unspecs above.
	(bt): New int attribute.

>         * internal-fn.def: internal functions for MULH[R]S patterns

	* internal-fn.def (IFN_MULHS, IFN_MULHRS): New internal functions.

>         * optabs.def: optabs definitions for above and sign variants

	* optabs.def (smulhs_optab, smulhrs_optab, umulhs_optab)
	(umulhrs_optab): New optabs.

>         * tree-vect-patterns.c (vect_recog_multhi_pattern): pattern 
> recognition function for MULHRS

	* tree-vect-patterns.c (vect_recog_multhi_pattern): New function.
	(vect_vect_recog_func_ptrs): Add it.

>         * gcc.target/aarch64/sve2/mulhrs_1.c: new test for all 
> variants

Just:

	* gcc.target/aarch64/sve2/mulhrs_1.c: New test.

(Sorry that this is so finicky.  I'm just the messenger. :-))

> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
> b/gcc/config/aarch64/aarch64-sve2.md
> index 
> 2334e5a7b7dc524bbd1f4d0a48ba5cd991970118..51783604ad8f83eb1d070c133009
> ed41a2a0252d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -63,3 +63,89 @@
>     movprfx\t%0, %2\;<sur>h<addsub>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
>    [(set_attr "movprfx" "*,yes")]
>  )
> +
> +;; Multiply long top / bottom

Very minor, but: GCC comments traditionally end with "." even if they're not full sentences.

> +(define_insn "*<su>mull<bt><Vwide>"
> +  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
> +    (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand" "w")
> +			  (match_operand:SVE_BHSI 2 "register_operand" "w")]
> +			 MULLBT))]
> +  "TARGET_SVE2"
> +  "<su>mull<bt>\t%0.<Vewtype>, %1.<Vetype>, %2.<Vetype>"
> +)
> +
> +(define_expand "<su>mull<bt><Vwide>"
> +  [(set (match_operand:<VWIDE> 0 "register_operand")
> +    (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand")
> +			  (match_operand:SVE_BHSI 2 "register_operand")]
> +			 MULLBT))]
> +  "TARGET_SVE2"
> +  ""
> +)

It's more usual to put the define_expand first.  But in this case we don't need separate define_expands, we can just make the define_insn the named pattern itself:

;; Multiply long top / bottom.
(define_insn "<su>mull<bt><Vwide>"
  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
    (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand" "w")
		     (match_operand:SVE_BHSI 2 "register_operand" "w")]
		    MULLBT))]
  "TARGET_SVE2"
  "<su>mull<bt>\t%0.<Vewtype>, %1.<Vetype>, %2.<Vetype>"
)

> +
> +;; (Rounding) Right shift narrow bottom (define_insn 
> +"*<r>shrnb<mode>"
> +  [(set (match_operand:SVE_BHSI 0 "register_operand" "=w")
> +	  (unspec:SVE_BHSI [(match_operand:<VWIDE> 1 "register_operand" "w")
> +			  (match_operand 2 "aarch64_simd_shift_imm_offset_<Vel>" "i")]

This is more a personal thing, but I think it's better to leave out the "i" (just remove it rather than replace it with "") when the operand doesn't in fact accept all the things that "i" allows.

> +			 SHRNB))]
> +  "TARGET_SVE2"
> +  "<r>shrnb\t%0.<Vetype>, %1.<Vewtype>, #%2"
> +)
> +
> +(define_expand "<r>shrnb<mode>"
> +  [(set (match_operand:SVE_BHSI 0 "register_operand")
> +	 (unspec:SVE_BHSI [(match_operand:<VWIDE> 1 "register_operand")
> +			  (match_operand 2 "aarch64_simd_shift_imm_offset_<Vel>")]
> +			 SHRNB))]
> +  "TARGET_SVE2"
> +  ""
> +)
> +
> +;; (Rounding) Right shift narrow top
> +(define_insn "*<r>shrnt<mode>"
> +  [(set (match_operand:SVE_BHSI 0 "register_operand" "=w")
> +	  (unspec:SVE_BHSI [(match_operand:SVE_BHSI 1 "register_operand" 
> +"%0")

"%" indicates that operand 1 is commutative with operand 2, which isn't the case here.

> +	      (match_operand:<VWIDE> 2 "register_operand" "w")
> +			  (match_operand 3 "aarch64_simd_shift_imm_offset_<Vel>" "i")]

Same comment about "i" here.

> +			 SHRNT))]
> +  "TARGET_SVE2"
> +  "<r>shrnt\t%0.<Vetype>, %2.<Vewtype>, #%3"
> +)
> +
> +(define_expand "<r>shrnt<mode>"
> +  [(set (match_operand:SVE_BHSI 0 "register_operand")
> +	  (unspec:SVE_BHSI [(match_operand:<VWIDE> 1 "register_operand")
> +	      (match_operand:<VWIDE> 2 "register_operand")
> +			  (match_operand 3 "aarch64_simd_shift_imm_offset_<Vel>")]
> +			 SHRNT))]
> +  "TARGET_SVE2"
> +  ""
> +)

Each of the above two define_expands can also be removed and the "*"
removed from the define_insn.

> +;; Unpredicated integer multiply-high-with-(round-and-)scale
> +(define_expand "<su>mulh<r>s<mode>3"
> +  [(set (match_operand:SVE_BHSI 0 "register_operand")
> +	(unspec:SVE_BHSI
> +	  [(match_dup 3)
> +	   (unspec:SVE_BHSI [(match_operand:SVE_BHSI 1 "register_operand")
> +			  (match_operand:SVE_BHSI 2 "register_operand")]
> +			 MULHRS)]
> +	  UNSPEC_PRED_X))]
> +  "TARGET_SVE2"
> +  {
> +    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));

These days this should be:

    operands[3] = aarch64_ptrue_reg (<VPRED>mode);

There's been a bit of churn around this, sorry.

> @@ -1585,6 +1597,13 @@
>  
>  (define_int_iterator RHADD [UNSPEC_SRHADD UNSPEC_URHADD])
>  
> +(define_int_iterator MULLBT [UNSPEC_SMULLB UNSPEC_UMULLB
> +            UNSPEC_SMULLT UNSPEC_UMULLT])

Minor formatting nit: should be indented one column beyond the "[".

> +
> +(define_int_iterator SHRNB [UNSPEC_SHRNB UNSPEC_RSHRNB])
> +
> +(define_int_iterator SHRNT [UNSPEC_SHRNT UNSPEC_RSHRNT])
> +
>  (define_int_iterator DOTPROD [UNSPEC_SDOT UNSPEC_UDOT])
>  
>  (define_int_iterator ADDSUBHN [UNSPEC_ADDHN UNSPEC_RADDHN @@ -1604,6 
> +1623,9 @@
>  
>  (define_int_iterator VQDMULH [UNSPEC_SQDMULH UNSPEC_SQRDMULH])
>  
> +(define_int_iterator MULHRS [UNSPEC_SMULHS UNSPEC_UMULHS
> +            UNSPEC_SMULHRS UNSPEC_UMULHRS])
> +

Same here.

>  (define_int_iterator USSUQADD [UNSPEC_SUQADD UNSPEC_USQADD])
>  
>  (define_int_iterator SUQMOVN [UNSPEC_SQXTN UNSPEC_UQXTN]) @@ -1866,7 
> +1888,11 @@
>  		     (UNSPEC_COND_FCVTZS "s")
>  		     (UNSPEC_COND_FCVTZU "u")
>  		     (UNSPEC_COND_SCVTF "s")
> -		     (UNSPEC_COND_UCVTF "u")])
> +		     (UNSPEC_COND_UCVTF "u")
> +	       (UNSPEC_SMULLB "s") (UNSPEC_UMULLB "u")
> +	       (UNSPEC_SMULLT "s") (UNSPEC_UMULLT "u")
> +	       (UNSPEC_SMULHS "s") (UNSPEC_UMULHS "u")
> +	       (UNSPEC_SMULHRS "s") (UNSPEC_UMULHRS "u")])

These should line up with the existing entries.

>  
>  (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
>  		      (UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur") @@ -1904,6 +1930,10 
> @@
>                      (UNSPEC_SQRSHRN "r") (UNSPEC_UQRSHRN "r")
>                      (UNSPEC_SQSHL   "")  (UNSPEC_UQSHL  "")
>                      (UNSPEC_SQRSHL   "r")(UNSPEC_UQRSHL  "r")
> +		    (UNSPEC_SHRNB "") (UNSPEC_SHRNT "")
> +		    (UNSPEC_RSHRNB "r") (UNSPEC_RSHRNT "r")
> +	       (UNSPEC_SMULHS "") (UNSPEC_UMULHS "")
> +	       (UNSPEC_SMULHRS "r") (UNSPEC_UMULHRS "r")

Same here.

>  ])
>  
>  (define_int_attr lr [(UNSPEC_SSLI  "l") (UNSPEC_USLI  "l") @@ -1916,6 
> +1946,9 @@
>  		    (UNSPEC_SHADD "") (UNSPEC_UHADD "u")
>  		    (UNSPEC_SRHADD "") (UNSPEC_URHADD "u")])
>  
> +(define_int_attr bt [(UNSPEC_SMULLB "b") (UNSPEC_UMULLB "b")
> +		    (UNSPEC_SMULLT "t") (UNSPEC_UMULLT "t")])

This line should be indented one column further.

> diff --git a/gcc/optabs.def b/gcc/optabs.def index 
> 5283e6753f26eb0da08a393949128d09643e50dd..adcd9477b4c4b2e73cad7c72ee51
> 336e98e81954 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -342,6 +342,10 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a")  
> OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D (usad_optab, 
> "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a")
> +OPTAB_D (smulhs_optab, "smulhs$a3")
> +OPTAB_D (smulhrs_optab, "smulhrs$a3") OPTAB_D (umulhs_optab, 
> +"umulhs$a3") OPTAB_D (umulhrs_optab, "umulhrs$a3")
>  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")  
> OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")  OPTAB_D 
> (vec_pack_trunc_optab, "vec_pack_trunc_$a")

The new patterns need to be documented in doc/md.texi.

> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/mulhrs_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/mulhrs_1.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..98cdc73d5256d712c985c541c1a5
> 85d081e5f5ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/mulhrs_1.c
> @@ -0,0 +1,123 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details 
> +--save-temps" } */
> +
> +#include <stdint.h>
> +
> +#define MULTHI(TYPE, BIGGER, RND)                     \
> +TYPE __attribute__ ((noinline, noclone))              \
> +mulhs_##TYPE##_##RND (TYPE *restrict x,               \
> +        TYPE *restrict y, TYPE *restrict z, int n)    \
> +{                                                     \
> +  for (int i = 0; i < n; i++)                         \
> +  {                                                   \
> +    z[i] = ((((BIGGER)x[i] * (BIGGER)y[i]) >>         \
> +            (sizeof(BIGGER)/2-(RND+1)) + RND) >> 1;   \

Missing ")" here.

> +  }                                                   \
> +}
> +
> +MULTHI (int8_t, int16_t, 0)
> +MULTHI (int16_t, int32_t, 0)
> +MULTHI (int32_t, int64_t, 0)
> +
> +MULTHI (uint8_t, uint16_t, 0)
> +MULTHI (uint16_t, uint32_t, 0)
> +MULTHI (uint32_t, uint64_t, 0)
> +
> +MULTHI (int8_t, int16_t, 1)
> +MULTHI (int16_t, int32_t, 1)
> +MULTHI (int32_t, int64_t, 1)
> +
> +MULTHI (uint8_t, uint16_t, 1)
> +MULTHI (uint16_t, uint32_t, 1)
> +MULTHI (uint32_t, uint64_t, 1)
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 
> +12 "vect" } } */
> +
> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tsmullt\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tshrnb\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +                  \tshrnt\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +              } 
> +     1 } } */

Unfortunately, this kind of multi-line scan isn't supported in this
form: the dg-final has to be on a single line.  (Also unfortunately, you don't get an error about this; the line is simply ignored.)

With scheduling enabled, only the order of the shrnb and shrnt is guaranteed, so I think we should have separate scan-assembler-times for each line.

> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tsmullt\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tshrnb\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +                  \tshrnt\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tsmullt\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tshrnb\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +                  \tshrnt\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +              } 
> +     1 } } */
> +     
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tumullt\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tshrnb\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +                  \tshrnt\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tumullt\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tshrnb\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +                  \tshrnt\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tumullt\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tshrnb\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +                  \tshrnt\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +              } 
> +     1 } } */
> +     
> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tsmullt\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \trshrnb\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +                  \trshrnt\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tsmullt\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \trshrnb\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +                  \trshrnt\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tsmullb\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tsmullt\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \trshrnb\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +                  \trshrnt\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +              } 
> +     1 } } */
> +     
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \tumullt\tz[0-9]+\.h, z[0-9]+\.b, z[0-9]+\.b\n
> +                  \trshrnb\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +                  \trshrnt\tz[0-9]+\.b, z[0-9]+\.h, #7\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \tumullt\tz[0-9]+\.s, z[0-9]+\.h, z[0-9]+\.h\n
> +                  \trshrnb\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +                  \trshrnt\tz[0-9]+\.h, z[0-9]+\.s, #15\n
> +              } 
> +     1 } } */
> +/* { dg-final { scan-assembler-times {
> +                  \tumullb\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \tumullt\tz[0-9]+\.d, z[0-9]+\.s, z[0-9]+\.s\n
> +                  \trshrnb\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +                  \trshrnt\tz[0-9]+\.s, z[0-9]+\.d, #31\n
> +              } 
> +     1 } } */
> +     
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 
> ccb2e1edecda09db786c0d98ccd25f5be107c7e4..96e86f716da0b819640cca2ff5d0
> 1c3ce8b342e3 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -1723,6 +1723,169 @@ vect_recog_over_widening_pattern (stmt_vec_info last_stmt_info, tree *type_out)
>    return pattern_stmt;
>  }
>  
> +/* Recognize the following patterns:
> +
> +	    ATYPE a;  // narrower than TYPE
> +	    BTYPE b;  // narrower than TYPE
> +
> +	    // multiply high with scaling
> +	    1) TYPE res = ((TYPE) a * (TYPE) b) >> c;
> +	    // or also with rounding
> +	    2) TYPE res = (((TYPE) a * (TYPE) b) >> d + 1) >> 1;
> +
> +	where only the bottom half of res is used.  */

Should be indented three spaces only.

> +
> +static gimple *
> +vect_recog_multhi_pattern (stmt_vec_info last_stmt_info, tree 
> +*type_out)

Maybe "vect_recog_mulhs_pattern", for consistency with the IFN names?

> +{
> +	vec_info *vinfo;

Might as well delay declaring this until...

> +
> +  /* Check for a right shift.  */
> +  gassign *last_stmt = dyn_cast <gassign *> (last_stmt_info->stmt);  
> + if (!last_stmt
> +      || gimple_assign_rhs_code (last_stmt) != RSHIFT_EXPR)
> +    return NULL;
> +  vinfo = last_stmt_info->vinfo;

...here.  (A lot of existing GCC code predates even C99 and so doesn't do this, but it's more usual for newer code.)

> +
> +  /* Check that the shift result is wider than the users of the
> +     result need (i.e. that narrowing would be a natural choice).  */  
> + tree lhs = gimple_assign_lhs (last_stmt);  tree lhs_type = TREE_TYPE 
> + (lhs);  unsigned int target_precision =
> +    vect_element_precision (last_stmt_info->min_output_precision);

GCC formatting puts operators at the start rather than the end of a line:

  unsigned int target_precision
    = vect_element_precision (last_stmt_info->min_output_precision);

Same for the rest of the function.

> +  if (!INTEGRAL_TYPE_P (lhs_type)
> +      || target_precision >= TYPE_PRECISION (lhs_type))
> +    return NULL;
> +
> +  /* Look through any change in sign on the outer shift input.  */  
> + vect_unpromoted_value unprom_input;  tree rhs_rshift_inp = 
> + vect_look_through_possible_promotion (vinfo,
> +          gimple_assign_rhs1 (last_stmt), &unprom_input);

Formatting:

  tree rhs_rshift_inp = vect_look_through_possible_promotion
    (vinfo, gimple_assign_rhs1 (last_stmt), &unprom_input);

> +  if (!rhs_rshift_inp
> +      || TYPE_PRECISION (TREE_TYPE (rhs_rshift_inp))
> +         != TYPE_PRECISION (lhs_type))
> +    return NULL;
> +
> +  /* Get the definition of the shift input.  */  stmt_vec_info 
> + input_stmt_info =
> +      vect_get_internal_def (vinfo, rhs_rshift_inp);  if 
> + (!input_stmt_info)
> +    return NULL;
> +  gassign *input_stmt = dyn_cast <gassign *> (input_stmt_info->stmt);  
> + if (!input_stmt)
> +    return NULL;
> +
> +  stmt_vec_info mulhs_stmt_info;
> +  internal_fn ifn;
> +  unsigned int expect_offset;
> +
> +  /* Check for presence of rounding term.  */  if 
> + (gimple_assign_rhs_code (input_stmt) == PLUS_EXPR)  {
> +    /* Check that the outer shift was by 1.  */
> +    if (!integer_onep (gimple_assign_rhs2 (last_stmt)))
> +      return NULL;

Brace should be indented two columns beyond the "if", giving:

  if (gimple_assign_rhs_code (input_stmt) == PLUS_EXPR)
    {
      /* Check that the outer shift was by 1.  */
      if (!integer_onep (gimple_assign_rhs2 (last_stmt)))
	return NULL;

etc.

> +
> +    /* Check that one of the operands is 1.  */
> +    tree oprnds[2] = { gimple_assign_rhs1 (input_stmt),
> +                       gimple_assign_rhs2 (input_stmt) };
> +    bool isone[2] = { integer_onep (oprnds[0]),
> +                      integer_onep (oprnds[1]) };
> +    if (!(isone[0] ^ isone[1]))
> +      return NULL;

In a PLUS_EXPR, any constant operand has to come second, so this only needs to check integer_onep (oprnds[1]).

> +    tree mulhs = oprnds[(int)(isone[0])];
> +
> +    mulhs_stmt_info = vect_get_internal_def (vinfo, mulhs);
> +    if (!mulhs_stmt_info)
> +      return NULL;
> +
> +    expect_offset = target_precision + 2;
> +    ifn = IFN_MULHRS;
> +  }
> +  else
> +  {
> +    mulhs_stmt_info = last_stmt_info;
> +
> +    expect_offset = target_precision + 1;
> +    ifn = IFN_MULHS;
> +  }
> +
> +  /* Get the definition of the multiply-high-scale part.  */  gassign 
> + *mulhs_stmt = dyn_cast <gassign *> (mulhs_stmt_info->stmt);  if 
> + (!mulhs_stmt
> +      || gimple_assign_rhs_code (mulhs_stmt) != RSHIFT_EXPR)
> +    return NULL;

I think this should go in the PLUS_EXPR arm, since we've already checked it for the "else" path.

> +
> +  /* Get the scaling factor.  */
> +  tree rhs_scale = gimple_assign_rhs2 (mulhs_stmt);  tree 
> + rhs_scale_type = TREE_TYPE (rhs_scale);  if (TREE_CODE (rhs_scale) 
> + != INTEGER_CST
> +      || TREE_CODE (rhs_scale_type) != INTEGER_TYPE
> +      || !type_has_mode_precision_p (rhs_scale_type))
> +    return NULL;

Just:

  /* Get the scaling factor.  */
  tree rhs_scale = gimple_assign_rhs2 (mulhs_stmt);
  if (TREE_CODE (rhs_scale) != INTEGER_CST)
    return NULL;

would be enough.

> +  /* Get the definition of the scaling input term.  */  tree rhs_mulh 
> + = gimple_assign_rhs1 (mulhs_stmt);  tree rhs_mulh_type = TREE_TYPE 
> + (rhs_mulh);  if (!INTEGRAL_TYPE_P (rhs_mulh_type))
> +    return NULL;

This is guaranteed to be true, no need to check.

> +  stmt_vec_info mulh_stmt_info = vect_get_internal_def (vinfo, 
> + rhs_mulh);  if (!mulh_stmt_info)
> +    return NULL;
> +
> +  /* Check that the shift represents the correct scaling.  */  if 
> + (wi::ne_p(wi::to_widest(rhs_scale) + expect_offset,
> +          TYPE_PRECISION (rhs_mulh_type)))
> +    return NULL;

At this point it's guaranteed that:

  TYPE_PRECISION (rhs_mulh_type) == TYPE_PRECISION (lhs_type)

I think it would be better use lhs_type for the condition and check the value as part of the "TREE_CODE (rhs_scale) != INTEGER_CST" condition above.  You can use != instead of wi::ne_p.

> +
> +  /* Check whether the scaling input term can be seen as two widened
> +     inputs multiplied together.  */
> +  vect_unpromoted_value unprom_mult[2];
> +  tree new_type;
> +  unsigned int nops = vect_widened_op_tree (mulh_stmt_info, MULT_EXPR,
> +					    WIDEN_MULT_EXPR, false, 2,
> +					    unprom_mult, &new_type);
> +  if (nops != 2)
> +    return NULL;
> +
> +  vect_pattern_detected ("vect_recog_multhi_pattern", last_stmt);
> +
> +  /* Adjust output precision.  */
> +  if (TYPE_PRECISION (new_type) < target_precision)
> +    new_type = build_nonstandard_integer_type (target_precision,
> +					       TYPE_UNSIGNED (new_type));
> +
> +  /* Check for target support.  */
> +  tree new_vectype = get_vectype_for_scalar_type (new_type);
> +  if (!new_vectype
> +      || !direct_internal_fn_supported_p (ifn, new_vectype,
> +					  OPTIMIZE_FOR_SPEED))
> +    return NULL;
> +
> +  /* The IR requires a valid vector type for the cast result, even though
> +     it's likely to be discarded.  */  *type_out = 
> + get_vectype_for_scalar_type (lhs_type);  if (!*type_out)
> +    return NULL;
> +
> +  /* Generate the IFN_MULHRS call.  */  tree new_var = 
> + vect_recog_temp_ssa_var (new_type, NULL);  tree new_ops[2];  
> + vect_convert_inputs (last_stmt_info, 2, new_ops, new_type,
> +          unprom_mult, new_vectype);

Arguments should be indented one character beyond the "(".

> +  gcall *mulhrs_stmt = gimple_build_call_internal (ifn, 2,
> +          new_ops[0], new_ops[1]);

Same here.

> +  gimple_call_set_lhs (mulhrs_stmt, new_var);  gimple_set_location 
> + (mulhrs_stmt, gimple_location (last_stmt));
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "created pattern stmt: %G", mulhrs_stmt);
> +
> +  return vect_convert_output (last_stmt_info, lhs_type,
> +          mulhrs_stmt, new_vectype);

Here too.

It would be good to have some tests in gcc.dg/vect too, along the lines of gcc.dg/vect/vect-avg-[1-4].c.

For the record, I think in principle we could look through other sign conversions too.  E.g. we might have (psuedo-gimple):

  int _2 = _1 >> 14;
  unsigned int _3 = (unsigned int) _2;
  unsigned int _4 = _3 + 1;
  int _5 = (int) _4;
  int _6 = _5 >> 1;

This guarantees that overflow in the addition of _4 is well-defined.
Or we might have a sign change on the input to the shift.  But I think those cases should be treated as potential follow-on improvements if we see a case that needs them (probably via a new helper to make looking through sign conversions a bit easier).  It doesn't need to be part of the initial patch.

Thanks,
Richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rb11655.patch
Type: application/octet-stream
Size: 25196 bytes
Desc: rb11655.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20190912/52eea83d/attachment.obj>