This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Vector shuffling
- From: Richard Henderson <rth at redhat dot com>
- To: Artem Shinkarov <artyom dot shinkaroff at gmail dot com>
- Cc: gcc-patches at gcc dot gnu dot org, Richard Guenther <richard dot guenther at gmail dot com>
- Date: Mon, 16 Aug 2010 10:42:46 -0700
- Subject: Re: Vector shuffling
- References: <AANLkTi=LX863_tHsONocW2arP1b6ax9JCwxvuzcEPmr-@mail.gmail.com>
Only looking closely at the i386 changes for now.
On 08/15/2010 07:30 AM, Artem Shinkarov wrote:
> + /* Recursively grab the definition of the variable. */
> + while (TREE_CODE (mask) == SSA_NAME)
> + {
> + gimple maskdef = SSA_NAME_DEF_STMT (mask);
> + if (gimple_assign_single_p (maskdef))
> + mask = gimple_assign_rhs1 (maskdef);
> + else
> + break;
> + }
> +
Err, surely copy-propagation has happened and this loop isn't needed.
In particular, I'd hope that MASK is *already* VECTOR_CST if it is
in fact constant.
> + t = ix86_vectorize_builtin_vec_perm (TREE_TYPE (vec0), &m_type);
> +
> + if (t != NULL_TREE)
> + {
> + gimple c = gimple_build_call (t, 3, vec0, vec0, mask);
> + gimple stmt = gsi_stmt (*gsi);
> + gimple_call_set_lhs (c, gimple_call_lhs (stmt));
> + gsi_replace (gsi, c, false);
You should need to use m_type here in casting the arguments.
In particular your existing mask could be V4SI with unsigned
elements, whereas the builtin takes V4SI with signed elements.
> + /* If we cannot expand it via vec_perm, we will try to expand it
> + via PSHUFB instruction. */
> + {
Indentation is off. Although it wouldn't be if you moved the
TARGET_SSE3 || TARGET_AVX test up to be an IF protecting this block,
which would also help readability.
> + /* m1var = mm1var << 8*i */
> + m1 = build2 (LSHIFT_EXPR, mtype, m1var,
> + build_int_cst (TREE_TYPE (mtype), 8*i));
> + t = force_gimple_operand_gsi (gsi, m1,
> + true, NULL_TREE, true, GSI_SAME_STMT);
> + asgn = gimple_build_assign (m1var, t);
> + gsi_insert_before (gsi, asgn , GSI_SAME_STMT);
> +
> + /* mvar = mvar | m1var */
> + m1 = build2 (BIT_IOR_EXPR, mtype, mvar, m1var);
> + t = force_gimple_operand_gsi (gsi, m1,
> + true, NULL_TREE, true, GSI_SAME_STMT);
> + asgn = gimple_build_assign (mvar, t);
> + gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
I don't believe this computes the proper values. Suppose we're
trying to generate a permuation for a V4SI. Suppose that [A-D]
are already masked to [0-3]. As far as I can see you'll produce
t0 = (v16qi){ A,0,0,0,B,0,0,0,C,0,0,0,D,0,0,0 }
t1 = t0 << 8; = (v16qi){ A,A,0,0,B,B,0,0,C,C,0,0,D,D,0,0 }
t2 = t1 << 16; = (v16qi){ A,A,A,A,B,B,B,B,C,C,C,C,D,D,D,D }
when what you really want is
t2 = (v16qi){ A*4, A*4+1, A*4+2, A*4+3,
B*4, B*4+1, B*4+2, B*4+3,
C*4, C*4+1, C*4+2, C*4+3,
D*4, D*4+1, D*4+2, D*4+3 };
So:
t0 = mask & (v4si){ 3, 3, 3, 3 };
t1 = t0 * 4;
You ought to perform the permutation into T2 above in one step, not
explicit shifts. You know this will succeed because you're already
assuming pshufb.
t2 = __builtin_ia32_vec_perm_v16qi_u
((v16qi)t1,
(v16qi){ 0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12 });
You need to add an offset compensation vector dependent on the source
vector element size, e.g.
t3 = t2 + (v16qi){ 0,1,2,3, 0,1,2,3, 0,1,2,3, 0,1,2,3 }
> + if (fntype != NULL_TREE)
You already checked this above, although given that you tested for
SSE3, I don't think even that is needed. You can assume it.
> +ix86_vectorize_builtin_shuffle2 (gimple_stmt_iterator *gsi,
> + tree vec0, tree vec1, tree mask)
You ought to extract all of the pshufb code from above so that
you can re-use it here for the TARGET_XOP vpperm instruction,
which does in fact perform the two operand shuffle.
> - for (i = 0; i < nelt; ++i, list = TREE_CHAIN (list))
> + for (i = 0; i < nelt; ++i, list =
> + (list == NULL_TREE ? NULL_TREE : TREE_CHAIN (list)))
> {
> unsigned HOST_WIDE_INT e;
> + tree value;
> +
> + if (list != NULL_TREE)
> + value = TREE_VALUE (list);
> + else
> + value = fold_convert (TREE_TYPE (TREE_TYPE (cst)),
> + integer_zero_node);
Um, when are you EVER going to see a VECTOR_CST of the wrong size,
with the wrong number of elements. This is very most certainly a
bug elsewhere.
> - /* This hook is cannot be called in response to something that the
> - user does (unlike the builtin expander) so we shouldn't ever see
> - an error generated from the extract. */
> - gcc_assert (vec_mask > 0 && vec_mask <= 3);
> + /* Check whether the mask can be applied to the vector type. */
> + if (vec_mask < 0 || vec_mask > 3)
> + return false;
I'd very much prefer this check to be left in place. Indeed, this
points to the fact that you've incorrectly interpreted the spec for
the OpenCL shuffle builtins: if the user writes { 9,15,33,101 } as
a literal, you should interpret this with the n-1 mask in place,
i.e. { 1, 3, 1, 1 }.
Which suggests that you shouldn't be handling VECTOR_CST in the new
hooks at all. You should handle that in generic code and call into
the existing TARGET_VECTORIZE_BUILTIN_VEC_PERM hook.
The new hooks should *only* handle the variable permutation case.
r~