This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Vector shuffling


Only looking closely at the i386 changes for now.

On 08/15/2010 07:30 AM, Artem Shinkarov wrote:
> +  /* Recursively grab the definition of the variable.  */
> +  while (TREE_CODE (mask) == SSA_NAME)
> +    {
> +      gimple maskdef = SSA_NAME_DEF_STMT (mask);
> +      if (gimple_assign_single_p (maskdef))
> +        mask = gimple_assign_rhs1 (maskdef);
> +      else
> +        break;
> +    }
> +

Err, surely copy-propagation has happened and this loop isn't needed.
In particular, I'd hope that MASK is *already* VECTOR_CST if it is
in fact constant.

> +          t = ix86_vectorize_builtin_vec_perm (TREE_TYPE (vec0), &m_type);
> +          
> +          if (t != NULL_TREE)
> +            {
> +              gimple c = gimple_build_call (t, 3, vec0, vec0, mask);
> +              gimple stmt = gsi_stmt (*gsi);
> +              gimple_call_set_lhs (c, gimple_call_lhs (stmt));
> +              gsi_replace (gsi, c, false);

You should need to use m_type here in casting the arguments.
In particular your existing mask could be V4SI with unsigned
elements, whereas the builtin takes V4SI with signed elements.

> +  /* If we cannot expand it via vec_perm, we will try to expand it 
> +     via PSHUFB instruction.  */
> +    {

Indentation is off.  Although it wouldn't be if you moved the
TARGET_SSE3 || TARGET_AVX test up to be an IF protecting this block,
which would also help readability.

> +          /* m1var = mm1var << 8*i */
> +          m1 = build2 (LSHIFT_EXPR, mtype, m1var, 
> +                        build_int_cst (TREE_TYPE (mtype), 8*i));
> +          t = force_gimple_operand_gsi (gsi, m1,
> +                            true, NULL_TREE, true, GSI_SAME_STMT);
> +          asgn = gimple_build_assign (m1var, t);
> +          gsi_insert_before (gsi, asgn , GSI_SAME_STMT);
> +
> +          /* mvar = mvar | m1var */
> +          m1 = build2 (BIT_IOR_EXPR, mtype, mvar, m1var);
> +          t = force_gimple_operand_gsi (gsi, m1,
> +                            true, NULL_TREE, true, GSI_SAME_STMT);
> +          asgn = gimple_build_assign (mvar, t);
> +          gsi_insert_before (gsi, asgn, GSI_SAME_STMT);

I don't believe this computes the proper values.  Suppose we're
trying to generate a permuation for a V4SI.  Suppose that [A-D]
are already masked to [0-3].  As far as I can see you'll produce

  t0 			= (v16qi){ A,0,0,0,B,0,0,0,C,0,0,0,D,0,0,0 }
  t1 = t0 << 8;		= (v16qi){ A,A,0,0,B,B,0,0,C,C,0,0,D,D,0,0 }
  t2 = t1 << 16;	= (v16qi){ A,A,A,A,B,B,B,B,C,C,C,C,D,D,D,D }

when what you really want is

  t2 = (v16qi){ A*4, A*4+1, A*4+2, A*4+3,
	        B*4, B*4+1, B*4+2, B*4+3,
		C*4, C*4+1, C*4+2, C*4+3,
		D*4, D*4+1, D*4+2, D*4+3 };

So:

  t0 = mask & (v4si){ 3, 3, 3, 3 };
  t1 = t0 * 4;

You ought to perform the permutation into T2 above in one step, not
explicit shifts.  You know this will succeed because you're already
assuming pshufb.

  t2 = __builtin_ia32_vec_perm_v16qi_u
	((v16qi)t1,
	 (v16qi){ 0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12 });

You need to add an offset compensation vector dependent on the source
vector element size, e.g.

  t3 = t2 + (v16qi){ 0,1,2,3, 0,1,2,3, 0,1,2,3, 0,1,2,3 }

> +      if (fntype != NULL_TREE)

You already checked this above, although given that you tested for
SSE3, I don't think even that is needed.  You can assume it.

> +ix86_vectorize_builtin_shuffle2 (gimple_stmt_iterator *gsi, 
> +                                tree vec0, tree vec1, tree mask)

You ought to extract all of the pshufb code from above so that
you can re-use it here for the TARGET_XOP vpperm instruction,
which does in fact perform the two operand shuffle.

> -  for (i = 0; i < nelt; ++i, list = TREE_CHAIN (list))
> +  for (i = 0; i < nelt; ++i, list = 
> +                        (list == NULL_TREE ? NULL_TREE : TREE_CHAIN (list)))
>      {
>        unsigned HOST_WIDE_INT e;
> +      tree value;
> +
> +      if (list != NULL_TREE)
> +        value = TREE_VALUE (list);
> +      else
> +          value = fold_convert (TREE_TYPE (TREE_TYPE (cst)), 
> +                                integer_zero_node);

Um, when are you EVER going to see a VECTOR_CST of the wrong size,
with the wrong number of elements.  This is very most certainly a
bug elsewhere.

> -  /* This hook is cannot be called in response to something that the
> -     user does (unlike the builtin expander) so we shouldn't ever see
> -     an error generated from the extract.  */
> -  gcc_assert (vec_mask > 0 && vec_mask <= 3);
> +  /* Check whether the mask can be applied to the vector type.  */
> +  if (vec_mask < 0 || vec_mask > 3)
> +    return false;

I'd very much prefer this check to be left in place.  Indeed, this
points to the fact that you've incorrectly interpreted the spec for
the OpenCL shuffle builtins: if the user writes { 9,15,33,101 } as
a literal, you should interpret this with the n-1 mask in place,
i.e.  { 1, 3, 1, 1 }.

Which suggests that you shouldn't be handling VECTOR_CST in the new
hooks at all.  You should handle that in generic code and call into
the existing TARGET_VECTORIZE_BUILTIN_VEC_PERM hook.

The new hooks should *only* handle the variable permutation case.


r~


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]