[aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

Wed Feb 10 14:42:05 GMT 2021

Thanks for the quick review.

Updated patch attached. I've addressed your comments below.

Tests are still running, OK for trunk assuming tests come out clean?

[aarch64][vect] Support V8QI->V8HI WIDEN_ patterns

In the case where 8 out of every 16 elements are widened using a
widening pattern and the next 8 are skipped the patterns are not
recognized. This is because they are normally used in a pair, such  as
VEC_WIDEN_MINUS_HI/LO, to achieve a v16qi->v16hi conversion for example.
This patch adds support for V8QI->V8HI patterns.

gcc/ChangeLog:

    PR tree-optimisation/98772
    * optabs-tree.c (supportable_half_widening_operation): New function
    to check for supportable V8QI->V8HI widening patterns.
    * optabs-tree.h (supportable_half_widening_operation): New function.
    * tree-vect-stmts.c (vect_create_vectorized_promotion_stmts): New
    function to create promotion stmts for V8QI->V8HI widening patterns.
    (vectorizable_conversion): Add case for V8QI->V8HI.

gcc/testsuite/ChangeLog:

    PR tree-optimisation/98772
    * gcc.target/aarch64/pr98772.c: New test.

>> +  /* The case where a widening operation is not making use of the full width of
>> +     of the input vector, but using the full width of the output vector.
>> +     Return the non-wided code, which will be used after the inputs are
>
>non-widened
Done.

>> +     converted to the wide type.  */
>> +  if ((code == WIDEN_MINUS_EXPR
>> +      || code == WIDEN_PLUS_EXPR
>> +      || code == WIDEN_MULT_EXPR
>> +      || code == WIDEN_LSHIFT_EXPR)
>
>Minor formatting nit, but the ||s should be indented one space more.
Done.

>> +      && known_eq (TYPE_VECTOR_SUBPARTS (vectype_in),
>> +               TYPE_VECTOR_SUBPARTS (vectype_out)))
>> +  {
>> +    switch (code)
>> +    {
>> +      case WIDEN_LSHIFT_EXPR:
>> +     *code1 = LSHIFT_EXPR;
>> +     return true;
>> +     break;
>> +      case WIDEN_MINUS_EXPR:
>> +     *code1 = MINUS_EXPR;
>> +     return true;
>> +     break;
>> +      case WIDEN_PLUS_EXPR:
>> +     *code1 = PLUS_EXPR;
>> +     return true;
>> +     break;
>> +      case WIDEN_MULT_EXPR:
>> +     *code1 = MULT_EXPR;
>> +     return true;
>> +     break;
>> +      default:
>> +     gcc_unreachable ();
>> +    }
>> +  }
>
>Rather than return true, I think we should do something like:
>
>      if (!supportable_convert_operation (NOP_EXPR, vectype_out,
>                                          vectype_in, &dummy_code))
>        return false;
>
>      optab = optab_for_tree_code (*code1, vectype_out, optab_default);
>      return (optab_handler (optab, TYPE_MODE (vectype_out))
>              != CODE_FOR_nothing);
>
>to make sure that the target really does support this.
Done. I used 'optab_vector' not 'optab_default', as I thought that was correct for this case and otherwise 'optab_for_tree_code' fails an assertion when 'LSHIFT_EXPR' is used.

> AFAICT the caller always knows when it wants the “if” statement above
> to be used.  What it's doing is a bit different from what
> supportable_convert_operation normally does, so it might be better
> to put it into a separate function that tests whether the target
> supports the non-widening form of a widening operation.
Done.

>> +
>> +  vec_tmp.create (vec_oprnds0->length () * 2);
>
>It looks like the * 2 isn't needed.
Done.

>> +      if (is_gimple_call (new_stmt3))
>> +     {
>> +       new_tmp = gimple_call_lhs (new_stmt3);
>> +     }
>> +      else
>> +     {
>> +       new_tmp = gimple_assign_lhs (new_stmt3);
>> +     }
>
>The lhs is always new_tmp3, so it's not necessary to read it back.
Done.

>> +
>> +      /* Store the results for the next step.  */
>> +      vec_tmp.quick_push (new_tmp);
>
>FWIW, you could just assign to vec_oprnds[i] and not have vec_tmp,
>but I don't know whether that's more or less confusing.  Either way's
>fine with me.
I chose to keep vec_tmp, but I don't feel strongly about it.

>> +    }
>> +
>> +  vec_oprnds0->release ();
>> +  *vec_oprnds0 = vec_tmp;
>> +}
>> +
>>  
>>  /* Check if STMT_INFO performs a conversion operation that can be vectorized.
>>     If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized
>> @@ -4697,7 +4763,13 @@ vectorizable_conversion (vec_info *vinfo,
>>    nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
>>    nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>>    if (known_eq (nunits_out, nunits_in))
>> -    modifier = NONE;
>> +    if (code == WIDEN_MINUS_EXPR
>> +     || code == WIDEN_PLUS_EXPR
>> +     || code == WIDEN_LSHIFT_EXPR
>> +     || code == WIDEN_MULT_EXPR)
>> +     modifier = WIDEN;
>
>Formatting nit: the last line should be indented by 6 spaces rather than 8.
Done.

>> @@ -4743,9 +4815,21 @@ vectorizable_conversion (vec_info *vinfo,
>>        return false;
>> 
>>      case WIDEN:
>> -      if (supportable_widening_operation (vinfo, code, stmt_info, vectype_out,
>> -                                       vectype_in, &code1, &code2,
>> -                                       &multi_step_cvt, &interm_types))
>> +      if (known_eq (nunits_out, nunits_in)
>> +       && (code == WIDEN_MINUS_EXPR
>> +           || code == WIDEN_LSHIFT_EXPR
>> +           || code == WIDEN_PLUS_EXPR
>> +           || code == WIDEN_MULT_EXPR)
>> +       && supportable_convert_operation (code, vectype_out, vectype_in,
>> +                                         &code1))
>
>Guess this is personal taste, sorry, since it's clearly right both ways,
>but IMO it'd be better to drop the code test.  We can only get here
>with nunits_out==nunits_in if we're converting a widening operation into
>a non-widening operation.  If we do end up calling a separate function
>(as per the comment above), then it would abort in a meaningful place
>if something unexpected slips through.
Done.

>> +     {
>> +       gcc_assert (!(multi_step_cvt && op_type == binary_op));
>> +       break;
>> +     }
>> +      else if (supportable_widening_operation (vinfo, code, stmt_info,
>> +                                            vectype_out, vectype_in, &code1,
>> +                                            &code2, &multi_step_cvt,
>> +                                            &interm_types))
>>        {
>>          /* Binary widening operation can only be supported directly by the
>>             architecture.  */
>> @@ -4981,10 +5065,20 @@ vectorizable_conversion (vec_info *vinfo,
>>              c1 = codecvt1;
>>              c2 = codecvt2;
>>            }
>> -       vect_create_vectorized_promotion_stmts (vinfo, &vec_oprnds0,
>> -                                               &vec_oprnds1, stmt_info,
>> -                                               this_dest, gsi,
>> -                                               c1, c2, op_type);
>> +       if ((code == WIDEN_MINUS_EXPR
>> +            || code == WIDEN_PLUS_EXPR
>> +            || code == WIDEN_LSHIFT_EXPR
>> +            || code == WIDEN_MULT_EXPR)
>> +           && known_eq (nunits_in, nunits_out))
>
>Same comment here about dropping the code tests.
Done.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-aarch64-vect-Support-V8QI-V8HI-WIDEN_-patterns.patch
Type: text/x-patch
Size: 12651 bytes
Desc: 0001-aarch64-vect-Support-V8QI-V8HI-WIDEN_-patterns.patch
URL: <https://gcc.gnu.org/pipermail/gcc-patches/attachments/20210210/6de36f0f/attachment-0001.bin>