[RFC, vectorizer] Allow half_type for left shift in vect_operation_fits_smaller_type?

Thu Sep 21 14:49:00 GMT 2017

On Thu, 21 Sep 2017, Jon Beniston wrote:

> Hi,
> 
> The GCC vectorizer can't vectorize the following loop even though the target
> supports 2-lane SIMD left shift.
> 
>     short a[256], b[256];
>     foo ()
>     {
>       int i;
>       for (i=0; i<256; i++)
>         { a[i] = b[i] << 4; }
>     }
> 
> The reason seems to be GCC is promoting the source from short to int, then
> performing left shift on int type and finally a type demotion is done to
> covert it back to short. Below is the related tree dump:
> 
>      _2 = (intD.1) _1;
>      # RANGE [-524288, 524272] NONZERO 4294967280
>      _3 = _2 << 4;
>      # RANGE [-32768, 32767] NONZERO 65520
>      _4 = (short intD.10) _3;
>      # .MEM_8 = VDEF <.MEM_14>
>      aD.1888[i_13] = _4;
> 
> I checked tree-vect-patterns.c and found there is a pattern recognizer
> "vect_recog_over_widening_pattern" to recognize such sequences already.
>     
> But, in vect_operation_fits_smaller_type, it only recognizes the sequences
> when the promoted type is 4 times wider than the original type. The reason
> seems to be the original proposal at:
> 
>       https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01472.html
> 
> is to handle the following sequences where three types are involved, and the
> width, T_PROMOTED = 2 * T_INTER = 4 * T_ORIG.
> 
>       T_ORIG a;
>       T_PROMOTED b, c;
>       T_INTER d;
> 
>       b = (T_PROMOTED) a;
>       c = b << 2;
>       d = (T_INTER) c;
> 
> While we could also handle the following sequence where only two types are
> involved, and T_PROMOTED = 2 * T_ORIG
> 
>       T_ORIG a;
>       T_PROMOTED b, c, d;
> 
>       b = (T_PROMOTED) a;
>       c = b << 2;
>       d = (T_ORIG) c;
> 
> Performing the left shift on T_ORIG directly should be equal to performing
> it on T_PROMOTED then converting back to T_ORIG.
> 
> x86-64/AArch64/PPC64 bootstrap OK (finished on gcc farms) and no regression
> on check-gcc/g++.
> 
> gcc/
> 2017-09-21  Jon Beniston <jon@beniston.com>
> 
>         * tree-vect-patterns.c (vect_opertion_fits_smaller_type): Allow
>         half_type for LSHIFT_EXPR.
> 
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index cdad261..0abf37c 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -1318,7 +1318,12 @@ vect_operation_fits_smaller_type (gimple *stmt, tree
> def, tree *new_type,
>          break;
>  
>        case LSHIFT_EXPR:
> -        /* Try intermediate type - HALF_TYPE is not enough for sure.  */
> +        /* Try half_type.  */
> +        if (TYPE_PRECISION (type) == TYPE_PRECISION (half_type) * 2
> +	    && vect_supportable_shift (code, half_type))
> +          break;
> +
> +        /* Try intermediate type.  */
>          if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4))
>            return false;

Not digged long into this "interesting" function but this case is
only valid if type == final type and if the result is not shifted
back.  vect_recog_over_widening_pattern works on a whole sequence
of stmts after all, thus

  b = (T_PROMOTED) a;
  c = b << 2;
  d = b >> 2;
  e = (T_ORIG) b;

would be miscompiled by your new case.

Richard.