[PATCH 1/5] [PR target/96342] Change field "simdlen" into poly_uint64

Richard Sandiford richard.sandiford@arm.com
Mon Nov 2 14:15:41 GMT 2020


"yangyang (ET)" <yangyang305@huawei.com> writes:
> Hi,
>
> I have revised the patch based on your suggestions, and the following are some points that I think is needed to be mentioned in the mail.
>
>> > @@ -502,17 +504,18 @@ simd_clone_adjust_return_type (struct
>> cgraph_node *node)
>> >      veclen = node->simdclone->vecsize_int;
>> >    else
>> >      veclen = node->simdclone->vecsize_float;
>> > -  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t));
>> > -  if (veclen > node->simdclone->simdlen)
>> > +  veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE
>> > + (t)));  if (known_gt (veclen, node->simdclone->simdlen))
>> >      veclen = node->simdclone->simdlen;
>>
>> Although again it probably doesn't make a difference in practice, the
>> known/maybe situation is similar here.  When comparing:
>>
>> - an SVE vector of 2 + 2 * (VQ - 1) doubles and
>> - an Advanced SIMD vector of 2 doubles
>>
>> the Advanced SIMD version is conceptually ordered <= the SVE one, in the
>> sense that the SVE vector always contains a whole number of Advanced SIMD
>> vectors whereas the Advanced SIMD vector might not contain a whole number
>> of SVE vectors.
>>
>> In other words, the number of lanes in the Advanced SIMD vector is known_le
>> the number of lanes in the SVE vector, and the number of lanes in the SVE
>> vector is known_ge and maybe_gt (but not known_gt) the number of lanes in
>> the Advanced SIMD vector.  So for these kinds of comparison, known_gt can
>> look a bit unexpected, even if (as here) it's probably fine in practice.
>>
>> There's currently a hard-coded assumption in this code and in the vectoriser
>> that both constant-length software vectors and constant-length hardware
>> vectors are a power of 2 in size.  This means that the > above is effectively
>> testing whether veclen contains a whole number of
>> node->simdclone->simdlens, not just whether the veclen is bigger.
>>
>> So when adding the initial autovec support for SVE, we generally rewrote these
>> kinds of test to use multiple_p instead of comparisons.  I think that makes the
>> intent more obvious (but others might disagree :-)).
>>
>> That was a long-winded way of saying that I suggest we use:
>>
>>   if (multiple_p (veclen, node->simdclone->simdlen))
>>     veclen = node->simdclone->simdlen;
>>
>> instead.
>
> As far as I understand, multiple_p (a, b) is equal to known_ge (a, b) due to the hard-coded assumption, and !multiple_p (b, a) is equal to known_gt (a, b).

Yes to the first part, but !multiple_p (b, a) is equal to known_ge (a, b)
(== known_le (b, a)) rather than known_gt (a, b).  E.g.:

  multiple_p (2, 2 + 2X)

is false (for X>0) but:

  known_gt (2 + 2X, X)

is also false (for X==0).  In both cases, the conditions have an
implicit “for all X”.

> So it seems better to use !multiple_p (node->simdclone->simdlen, veclen) here,while using multiple_p (veclen, node->simdclone->simdlen) is OK since doing the assignment for the eq situation won't change anything. Anyway, I used multiple_p (veclen, node->simdclone->simdlen) in the revised patch.

!multiple_p is fine with me too FWIW, if you prefer that.

>> >      }
>> >    TREE_TYPE (TREE_TYPE (fndecl)) = t;
>> >    if (!node->definition)
>> > @@ -663,8 +669,9 @@ simd_clone_adjust_argument_types (struct
>> cgraph_node *node)
>> >     veclen = sc->vecsize_int;
>> >        else
>> >     veclen = sc->vecsize_float;
>> > -      veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
>> > -      if (veclen > sc->simdlen)
>> > +      veclen = exact_div (veclen,
>> > +                     GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)));
>> > +      if (known_gt (veclen, sc->simdlen))
>> >     veclen = sc->simdlen;
>>
>> Similarly here.
>
> Similarly here, use multiple_p (veclen, sc->simdlen) in the revised patch in addition.

Here too !multiple_p sounds OK if you prefer that.

>> > @@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct
>> cgraph_node *node,
>> >               iter, NULL_TREE, NULL_TREE);
>> >        adjustments->register_replacement
>> > (&(*adjustments->m_adj_params)[j], r);
>> >
>> > -      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
>> > -   j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
>> > +      if (known_lt (simd_clone_subparts (vectype),
>> node->simdclone->simdlen))
>> > +   {
>> > +     j += vector_unroll_factor (node->simdclone->simdlen,
>> > +                                simd_clone_subparts (vectype)) - 1;
>> > +   }
>>
>> And similarly here:
>>
>>       if (multiple_p (node->simdclone->simdlen, simd_clone_subparts
>> (vectype)))
>>
>
> Similarly here, use multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)) in the revised patch in addition.

To be clear, I think multiple_p is still natural here though.

>> >  /* Wrapper for poly_int arguments to target macros, so that if a target
>> >     doesn't need polynomial-sized modes, its header file can continue to
>> >     treat the argument as a normal constant.  This should go away once
>> > @@ -3902,12 +3902,12 @@ vectorizable_simd_clone_call (vec_info *vinfo,
>> stmt_vec_info stmt_info,
>> >      n = n->simdclone->next_clone)
>> >        {
>> >     unsigned int this_badness = 0;
>> > -   if (n->simdclone->simdlen > vf
>> > +   if (known_gt (n->simdclone->simdlen, vf)
>>
>> I think !constant_multiple_p rather than known_gt here.  Until we add
>> support for vectorising simd clones with partial vectors, a whole number of
>> simdlens must fit within VF.  So maybe:
>>
>>      unsigned int num_calls;
>>      if (!constant_multiple_p (n->simdclone->simdlen, vf, &num_calls)
>>
>> and then…
>>
>> >         || n->simdclone->nargs != nargs)
>> >       continue;
>> > -   if (n->simdclone->simdlen < vf)
>> > -     this_badness += (exact_log2 (vf)
>> > -                      - exact_log2 (n->simdclone->simdlen)) * 1024;
>> > +   if (known_lt (n->simdclone->simdlen, vf))
>> > +     this_badness += exact_log2
>> > +       (vector_unroll_factor (vf, n->simdclone->simdlen)) * 1024;
>>
>> …use num_calls instead of vector_unroll_factor here.
>
> The revised patch wrote these code as:
>
> unsigned int num_calls;
> if (!constant_multiple_p (vf, n->simdclone->simdlen, &num_calls)
>   || n->simdclone->nargs != nargs)
>  continue;
> if (!multiple_p (n->simdclone->simdlen, vf))
>  this_badness += exact_log2 (num_calls) * 1024;

I think this should be:

  if (num_calls != 1)
    this_badness += exact_log2 (num_calls) * 1024;

(Or we could do the addition unconditionally, both work.)

>> >     if (n->simdclone->inbranch)
>> >       this_badness += 2048;
>> >     int target_badness = targetm.simd_clone.usable (n); @@ -3988,19
>> > +3988,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info
>> stmt_info,
>> >     arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type,
>> >                                                       slp_node);
>> >     if (arginfo[i].vectype == NULL
>> > -       || (simd_clone_subparts (arginfo[i].vectype)
>> > -           > bestn->simdclone->simdlen))
>> > +       || (known_gt (simd_clone_subparts (arginfo[i].vectype),
>> > +                     bestn->simdclone->simdlen)))
>>
>> Here too I think we want constant_multiple_p:
>>
>>           || !constant_multiple_p (bestn->simdclone->simdlen,
>>                                    simd_clone_subparts (arginfo[i].vectype))
>>
>
> Use multiple_p here since the multiple is not needed.

True, but in the case of vectorisation, we need to generate a constant
number of copies at compile time.  If we don't enforce a constant
multiple, we might end up trying to use an Advanced SIMD routine
when vectorising for SVE.

The reason we don't have a two-argument version of constant_multiple_p
is that so far nothing has needed it (at least AFAIK).  There's no
conceptual problem with adding it though.  I'm happy to do that if
it would help.

>> >       if (ratype)
>> >         new_temp = create_tmp_var (ratype);
>> >       else if (useless_type_conversion_p (vectype, rtype)) @@ -4264,12
>> > +4266,13 @@ vectorizable_simd_clone_call (vec_info *vinfo,
>> > stmt_vec_info stmt_info,
>> >
>> >        if (vec_dest)
>> >     {
>> > -     if (simd_clone_subparts (vectype) < nunits)
>> > +     if (known_lt (simd_clone_subparts (vectype), nunits))
>>
>> multiple_p (nunits, …) here too.  It would also be possible to do:
>>
>>         if (constant_multiple_p (nunits, simd_clone_subparts (vectype), &k)
>>
>> and avoid the separate vector_unroll_factor.
>
> Use !multiple_p (simd_clone_subparts (vectype), nunits) here, since it seems that the following code should not be executed in the eq situation.

Ah, yes, sorry for missing that.

>>
>> >         {
>> >           unsigned int k, l;
>> >           poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
>> >           poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
>> > -         k = nunits / simd_clone_subparts (vectype);
>> > +         k = vector_unroll_factor (nunits,
>> > +                                   simd_clone_subparts (vectype));
>> >           gcc_assert ((k & (k - 1)) == 0);
>> >           for (l = 0; l < k; l++)
>> >             {
>> > @@ -4295,16 +4298,18 @@ vectorizable_simd_clone_call (vec_info *vinfo,
>> stmt_vec_info stmt_info,
>> >             vect_clobber_variable (vinfo, stmt_info, gsi, new_temp);
>> >           continue;
>> >         }
>> > -     else if (simd_clone_subparts (vectype) > nunits)
>> > +     else if (known_gt (simd_clone_subparts (vectype), nunits))
>>
>> multiple_p here too.
>>
>
> Similarly, use !multiple_p (nunits, simd_clone_subparts (vectype)) here.

OK.

> @@ -614,8 +618,9 @@ simd_clone_adjust_argument_types (struct cgraph_node *node)
>  	    veclen = sc->vecsize_int;
>  	  else
>  	    veclen = sc->vecsize_float;
> -	  veclen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type));
> -	  if (veclen > sc->simdlen)
> +	  veclen = exact_div (veclen,
> +			      GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type)));
> +	  if (known_gt (veclen, sc->simdlen))
>  	    veclen = sc->simdlen;
>  	  adj.op = IPA_PARAM_OP_NEW;
>  	  adj.param_prefix_index = IPA_PARAM_PREFIX_SIMD;

I think this should use multiple_p (or !multiple_p) too.

> @@ -981,8 +992,11 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
>  		  iter, NULL_TREE, NULL_TREE);
>        adjustments->register_replacement (&(*adjustments->m_adj_params)[j], r);
 
> -      if (simd_clone_subparts (vectype) < node->simdclone->simdlen)
> -	j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
> +      if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))
> +	{
> +	  j += vector_unroll_factor (node->simdclone->simdlen,
> +				     simd_clone_subparts (vectype)) - 1;
> +	}
>      }
> 
>    tree name;

Minor nit, but: the original formatting without braces is preferred
for single statements, even if the statement spans multiple lines.

Otherwise the patch looks good, thanks.

Richard


More information about the Gcc-patches mailing list