[aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

Wed Feb 10 19:46:20 GMT 2021

Joel Hutton <Joel.Hutton@arm.com> writes:
> @@ -277,6 +277,81 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>      }
>  }
>  
> +/* Function supportable_half_widening_operation
> +

I realise existing (related) functions do have a “Function foo” line,
but it's not generally the GCC style, so I think it would be better
to leave it out.

> +   Check whether an operation represented by code is a 'half' widening operation

Nit: CODE should be in caps since it's a parameter name.

> +   using only half of the full vector width for the inputs. This allows the

I think this could be misread as saying that we only look at half the
input vector.  Maybe it would be clearer to say something like “in which
the input vectors contain half as many bits as the output vectors”
instead of “using…”.

> +   output to be stored in a single vector, as opposed to using the full width
> +   for input, where half the elements must be processed at a time into
> +   respective full-width vector outputs i.e. a 'hi'/'lo' pair using codes such
> +   as VEC_WIDEN_MINUS_HI/LO. This is handled by widening the inputs and using
> +   non-widening instructions. RTL fusing converts these to the widening hardware
> +   instructions if supported.

And here the contrast is with cases in which the input vectors have
the same number of bits as the output vectors, meaning that we need
a lo/hi pair to generate twice the amount of output data.

Might be clearer to make “This is handled…” a new paragraph.

> +
> +   Supported widening operations:
> +    WIDEN_MINUS_EXPR
> +    WIDEN_PLUS_EXPR
> +    WIDEN_MULT_EXPR
> +    WIDEN_LSHIFT_EXPR
> +
> +   Output:
> +   - CODE1 - The non-widened code, which will be used after the inputs are
> +     converted to the wide type.
> +   */

Formatting nit, */ should be on the previous line, i.e. “.” + two spaces
+ “*/”.  (Sorry for all the formatting comments.)

> +bool
> +supportable_half_widening_operation (enum tree_code code,
> +			       tree vectype_out, tree vectype_in,
> +			       enum tree_code *code1)
> +{
> +  machine_mode m1,m2;
> +  bool truncp;
> +  enum tree_code dummy_code;
> +  optab op;
> +
> +  gcc_assert (VECTOR_TYPE_P (vectype_out) && VECTOR_TYPE_P (vectype_in));
> +
> +  m1 = TYPE_MODE (vectype_out);
> +  m2 = TYPE_MODE (vectype_in);
> +
> +  if (!VECTOR_MODE_P (m1) || !VECTOR_MODE_P (m2))
> +    return false;
> +
> +  if(!known_eq (TYPE_VECTOR_SUBPARTS (vectype_in),
> +		  TYPE_VECTOR_SUBPARTS (vectype_out)))
> +      return false;

Formatting nit: should be a space between “if” and “(”.  IMO
maybe_ne is more readable than !known_eq.  The return statement
should only be indented by 4 columns.

> +  if(!(code == WIDEN_MINUS_EXPR
> +       || code == WIDEN_PLUS_EXPR
> +       || code == WIDEN_MULT_EXPR
> +       || code == WIDEN_LSHIFT_EXPR))
> +    return false;
> +
> +  switch (code)
> +  {
> +    case WIDEN_LSHIFT_EXPR:
> +      *code1 = LSHIFT_EXPR;
> +      break;
> +    case WIDEN_MINUS_EXPR:
> +      *code1 = MINUS_EXPR;
> +      break;
> +    case WIDEN_PLUS_EXPR:
> +      *code1 = PLUS_EXPR;
> +      break;
> +    case WIDEN_MULT_EXPR:
> +      *code1 = MULT_EXPR;
> +      break;
> +    default:
> +      gcc_unreachable ();

I think it would be better to return false in the default case and remove
the “if” above, so that we only have one copy of the list of codes.

> +  }

Formatting nit: the { and } lines should be indented two spaces more.
(The contents of the switch are OK as-is.)

> +
> +  if (!supportable_convert_operation(NOP_EXPR, vectype_out, vectype_in,
> +				     &dummy_code))

Nit: should be a space between the function name and ”(”.

> +    return false;
> +
> +  op = optab_for_tree_code (*code1, vectype_out, optab_vector);
> +  return (optab_handler (op, TYPE_MODE (vectype_out)) != CODE_FOR_nothing);
> +}
> +
>  /* Function supportable_convert_operation
>  
>     Check whether an operation represented by the code CODE is a

> […]
> @@ -4697,7 +4755,13 @@ vectorizable_conversion (vec_info *vinfo,
>    nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
>    nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>    if (known_eq (nunits_out, nunits_in))
> -    modifier = NONE;
> +    if (code == WIDEN_MINUS_EXPR
> +	|| code == WIDEN_PLUS_EXPR
> +	|| code == WIDEN_LSHIFT_EXPR
> +	|| code == WIDEN_MULT_EXPR)
> +	modifier = WIDEN;

The “modifier = WIDEN;” line is indented by a tab (i.e. 8 columns),
but it should only be indented by 6 spaces.

> +    else
> +      modifier = NONE;
>    else if (multiple_p (nunits_out, nunits_in))
>      modifier = NARROW;
>    else
> @@ -4743,9 +4807,16 @@ vectorizable_conversion (vec_info *vinfo,
>        return false;
>  
>      case WIDEN:
> -      if (supportable_widening_operation (vinfo, code, stmt_info, vectype_out,
> -					  vectype_in, &code1, &code2,
> -					  &multi_step_cvt, &interm_types))
> +      if (supportable_half_widening_operation (code, vectype_out, vectype_in,
> +					    &code1))
> +	{
> +	  gcc_assert (!(multi_step_cvt && op_type == binary_op));
> +	  break;
> +	}

I think here we should keep the known_eq test from earlier and not
go into the code below for that case.  I.e. something like:

       if (known_eq (…))
         {
           if (!supportable_half_widening_operation (code, vectype_out,
                                                     vectype_in, &code1))
             goto unsupported;

           gcc_assert (!(multi_step_cvt && op_type == binary_op));
           break;
         }

This is because the fallback code later in the case statement doesn't
really apply here.  It also makes it easier to correlate this test
with the transform code below.

> +      else if (supportable_widening_operation (vinfo, code, stmt_info,
> +					       vectype_out, vectype_in, &code1,
> +					       &code2, &multi_step_cvt,
> +					       &interm_types))

After the change above, this would stay as a plain “if”.

>  	{
>  	  /* Binary widening operation can only be supported directly by the
>  	     architecture.  */
> @@ -4981,10 +5052,16 @@ vectorizable_conversion (vec_info *vinfo,
>  	      c1 = codecvt1;
>  	      c2 = codecvt2;
>  	    }
> -	  vect_create_vectorized_promotion_stmts (vinfo, &vec_oprnds0,
> -						  &vec_oprnds1, stmt_info,
> -						  this_dest, gsi,
> -						  c1, c2, op_type);
> +	  if (known_eq(nunits_out, nunits_in))

Should be a space between “known_eq” and “(”.

> +	    vect_create_vectorized_promotion_stmts (vinfo, &vec_oprnds0,
> +						    &vec_oprnds1, stmt_info,
> +						    this_dest, gsi,
> +						    c1, op_type);

Maybe it would be clearer to call this vect_create_half_widening_stmts,
to match the new query function.

Thanks,
Richard

> +	  else
> +	    vect_create_vectorized_promotion_stmts (vinfo, &vec_oprnds0,
> +						    &vec_oprnds1, stmt_info,
> +						    this_dest, gsi,
> +						    c1, c2, op_type);
>  	}
>  
>        FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0)