[PATCH 2/2]AArch64: Add better costing for vector constants and operations

Richard Sandiford richard.sandiford@arm.com
Tue Aug 31 15:13:53 GMT 2021


Tamar Christina <tamar.christina@arm.com> writes:
> @@ -13936,8 +13937,65 @@ cost_plus:
>  			     mode, MULT, 1, speed);
>            return true;
>          }
> +	break;
> +    case PARALLEL:
> +      /* Fall through */

Which code paths lead to getting a PARALLEL here?

> +    case CONST_VECTOR:
> +	{
> +	  rtx gen_insn = aarch64_simd_make_constant (x, true);
> +	  /* Not a valid const vector.  */
> +	  if (!gen_insn)
> +	    break;
>  
> -      /* Fall through.  */
> +	  switch (GET_CODE (gen_insn))
> +	  {
> +	  case CONST_VECTOR:
> +	    /* Load using MOVI/MVNI.  */
> +	    if (aarch64_simd_valid_immediate (x, NULL))
> +	      *cost += extra_cost->vect.movi;
> +	    else /* Load using constant pool.  */
> +	      *cost += extra_cost->ldst.load;
> +	    break;
> +	  /* Load using a DUP.  */
> +	  case VEC_DUPLICATE:
> +	    *cost += extra_cost->vect.dup;
> +	    break;

Does this trigger in practice?  The new check==true path (rightly) stops
the duplicated element from being forced into a register, but then
I would have expected:

rtx
gen_vec_duplicate (machine_mode mode, rtx x)
{
  if (valid_for_const_vector_p (mode, x))
    return gen_const_vec_duplicate (mode, x);
  return gen_rtx_VEC_DUPLICATE (mode, x);
}

to generate the original CONST_VECTOR again.

> +	  default:
> +	    *cost += extra_cost->ldst.load;
> +	    break;
> +	  }
> +	  return true;
> +	}
> +    case VEC_CONCAT:
> +	/* depending on the operation, either DUP or INS.
> +	   For now, keep default costing.  */
> +	break;
> +    case VEC_DUPLICATE:
> +	*cost += extra_cost->vect.dup;
> +	return true;
> +    case VEC_SELECT:
> +	{
> +	  /* cost subreg of 0 as free, otherwise as DUP */
> +	  rtx op1 = XEXP (x, 1);
> +	  int nelts;
> +	  if ((op1 == const0_rtx && !BYTES_BIG_ENDIAN)
> +	      || (BYTES_BIG_ENDIAN
> +		  && GET_MODE_NUNITS (mode).is_constant(&nelts)
> +		  && INTVAL (op1) == nelts - 1))
> +	    ;
> +	  else if (vec_series_lowpart_p (mode, GET_MODE (op1), op1))
> +	    ;
> +	  else if (vec_series_highpart_p (mode, GET_MODE (op1), op1))
> +	  /* Selecting the high part is not technically free, but we lack
> +	     enough information to decide that here.  For instance selecting
> +	     the high-part of a vec_dup *is* free or to feed into any _high
> +	     instruction.   Both of which we can't really tell.  That said
> +	     have a better chance to optimize an dup vs multiple constants.  */
> +	    ;

Not sure about this.  We already try to detect the latter case
(_high instructions) via aarch64_strip_extend_vec_half.  We might
be missing some cases, but that still feels like the right way
to go IMO.

Selecting the high part of a vec_dup should get folded into
another vec_dup.

The lowpart bits look OK, but which paths call this function
without first simplifying the select to a subreg?  The subreg
is now the canonical form (thanks to r12-2288).

Thanks,
Richard


More information about the Gcc-patches mailing list