This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Vector shuffling


On Mon, Aug 16, 2010 at 6:42 PM, Richard Henderson <rth@redhat.com> wrote:
> Only looking closely at the i386 changes for now.
>
> On 08/15/2010 07:30 AM, Artem Shinkarov wrote:
>> + ?/* Recursively grab the definition of the variable. ?*/
>> + ?while (TREE_CODE (mask) == SSA_NAME)
>> + ? ?{
>> + ? ? ?gimple maskdef = SSA_NAME_DEF_STMT (mask);
>> + ? ? ?if (gimple_assign_single_p (maskdef))
>> + ? ? ? ?mask = gimple_assign_rhs1 (maskdef);
>> + ? ? ?else
>> + ? ? ? ?break;
>> + ? ?}
>> +
>
> Err, surely copy-propagation has happened and this loop isn't needed.
> In particular, I'd hope that MASK is *already* VECTOR_CST if it is
> in fact constant.

MASK could be anything and I wanted to get to constructor, but looking
over the patch again may be this is a deprecated part.

>
>> + ? ? ? ? ?t = ix86_vectorize_builtin_vec_perm (TREE_TYPE (vec0), &m_type);
>> +
>> + ? ? ? ? ?if (t != NULL_TREE)
>> + ? ? ? ? ? ?{
>> + ? ? ? ? ? ? ?gimple c = gimple_build_call (t, 3, vec0, vec0, mask);
>> + ? ? ? ? ? ? ?gimple stmt = gsi_stmt (*gsi);
>> + ? ? ? ? ? ? ?gimple_call_set_lhs (c, gimple_call_lhs (stmt));
>> + ? ? ? ? ? ? ?gsi_replace (gsi, c, false);
>
> You should need to use m_type here in casting the arguments.
> In particular your existing mask could be V4SI with unsigned
> elements, whereas the builtin takes V4SI with signed elements.

Looking into the code of ix86_vectorize_builtin_vec_perm, I see that
the type returned via m_type is almost always the same to TREE_TYPE
(TREE_TYPE (vec0)) and in your case for V4SI mode there are two
functions: IX86_BUILTIN_VEC_PERM_V4SI_U and IX86_BUILTIN_VEC_PERM_V4SI
for unsigned and signed mask. The only change is happening in
V2DFmode. Could there be any problems with that?

And mask_type really returns mask element type.

>
>> + ?/* If we cannot expand it via vec_perm, we will try to expand it
>> + ? ? via PSHUFB instruction. ?*/
>> + ? ?{
>
> Indentation is off. ?Although it wouldn't be if you moved the
> TARGET_SSE3 || TARGET_AVX test up to be an IF protecting this block,
> which would also help readability.
>
>> + ? ? ? ? ?/* m1var = mm1var << 8*i */
>> + ? ? ? ? ?m1 = build2 (LSHIFT_EXPR, mtype, m1var,
>> + ? ? ? ? ? ? ? ? ? ? ? ?build_int_cst (TREE_TYPE (mtype), 8*i));
>> + ? ? ? ? ?t = force_gimple_operand_gsi (gsi, m1,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?true, NULL_TREE, true, GSI_SAME_STMT);
>> + ? ? ? ? ?asgn = gimple_build_assign (m1var, t);
>> + ? ? ? ? ?gsi_insert_before (gsi, asgn , GSI_SAME_STMT);
>> +
>> + ? ? ? ? ?/* mvar = mvar | m1var */
>> + ? ? ? ? ?m1 = build2 (BIT_IOR_EXPR, mtype, mvar, m1var);
>> + ? ? ? ? ?t = force_gimple_operand_gsi (gsi, m1,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?true, NULL_TREE, true, GSI_SAME_STMT);
>> + ? ? ? ? ?asgn = gimple_build_assign (mvar, t);
>> + ? ? ? ? ?gsi_insert_before (gsi, asgn, GSI_SAME_STMT);
>
> I don't believe this computes the proper values. ?Suppose we're
> trying to generate a permuation for a V4SI. ?Suppose that [A-D]
> are already masked to [0-3]. ?As far as I can see you'll produce
>
> ?t0 ? ? ? ? ? ? ? ? ? ?= (v16qi){ A,0,0,0,B,0,0,0,C,0,0,0,D,0,0,0 }
> ?t1 = t0 << 8; ? ? ? ? = (v16qi){ A,A,0,0,B,B,0,0,C,C,0,0,D,D,0,0 }
> ?t2 = t1 << 16; ? ? ? ?= (v16qi){ A,A,A,A,B,B,B,B,C,C,C,C,D,D,D,D }
>
> when what you really want is
>
> ?t2 = (v16qi){ A*4, A*4+1, A*4+2, A*4+3,
> ? ? ? ? ? ? ? ?B*4, B*4+1, B*4+2, B*4+3,
> ? ? ? ? ? ? ? ?C*4, C*4+1, C*4+2, C*4+3,
> ? ? ? ? ? ? ? ?D*4, D*4+1, D*4+2, D*4+3 };
>
> So:
>
> ?t0 = mask & (v4si){ 3, 3, 3, 3 };
> ?t1 = t0 * 4;
>
> You ought to perform the permutation into T2 above in one step, not
> explicit shifts. ?You know this will succeed because you're already
> assuming pshufb.
>
> ?t2 = __builtin_ia32_vec_perm_v16qi_u
> ? ? ? ?((v16qi)t1,
> ? ? ? ? (v16qi){ 0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12 });
>
> You need to add an offset compensation vector dependent on the source
> vector element size, e.g.
>
> ?t3 = t2 + (v16qi){ 0,1,2,3, 0,1,2,3, 0,1,2,3, 0,1,2,3 }
>
>> + ? ? ?if (fntype != NULL_TREE)
>
> You already checked this above, although given that you tested for
> SSE3, I don't think even that is needed. ?You can assume it.
>
>> +ix86_vectorize_builtin_shuffle2 (gimple_stmt_iterator *gsi,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?tree vec0, tree vec1, tree mask)
>
> You ought to extract all of the pshufb code from above so that
> you can re-use it here for the TARGET_XOP vpperm instruction,
> which does in fact perform the two operand shuffle.

That is *very* useful, thank you. I missed this conversion issues.

>
>> - ?for (i = 0; i < nelt; ++i, list = TREE_CHAIN (list))
>> + ?for (i = 0; i < nelt; ++i, list =
>> + ? ? ? ? ? ? ? ? ? ? ? ?(list == NULL_TREE ? NULL_TREE : TREE_CHAIN (list)))
>> ? ? ?{
>> ? ? ? ?unsigned HOST_WIDE_INT e;
>> + ? ? ?tree value;
>> +
>> + ? ? ?if (list != NULL_TREE)
>> + ? ? ? ?value = TREE_VALUE (list);
>> + ? ? ?else
>> + ? ? ? ? ?value = fold_convert (TREE_TYPE (TREE_TYPE (cst)),
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?integer_zero_node);
>
> Um, when are you EVER going to see a VECTOR_CST of the wrong size,
> with the wrong number of elements. ?This is very most certainly a
> bug elsewhere.

I noticed it quite recently, that nobody ever checked VECTOR_CST for
consistency, it is fixed about a week ago. But anyway, we have a loop
over dynamic list, where exit depends on a number, looks suspicious to
me. And still you can construct such a VECTOR_CST. So why should we
segfault, when we can just handle it...

>
>> - ?/* This hook is cannot be called in response to something that the
>> - ? ? user does (unlike the builtin expander) so we shouldn't ever see
>> - ? ? an error generated from the extract. ?*/
>> - ?gcc_assert (vec_mask > 0 && vec_mask <= 3);
>> + ?/* Check whether the mask can be applied to the vector type. ?*/
>> + ?if (vec_mask < 0 || vec_mask > 3)
>> + ? ?return false;
>
> I'd very much prefer this check to be left in place. ?Indeed, this
> points to the fact that you've incorrectly interpreted the spec for
> the OpenCL shuffle builtins: if the user writes { 9,15,33,101 } as
> a literal, you should interpret this with the n-1 mask in place,
> i.e. ?{ 1, 3, 1, 1 }.

The problem was different. vec_perm_ok can't handle every mask, for
instance mask with repeating element. And when I used it for the sake
of checking whether I can use a built-in permutation or not, I
received a false assertion as the answer. I think that this function
was never used in that way before, but what's wrong with just
returning false instead of false assertion?

>
> Which suggests that you shouldn't be handling VECTOR_CST in the new
> hooks at all. ?You should handle that in generic code and call into
> the existing TARGET_VECTORIZE_BUILTIN_VEC_PERM hook.

Well, the shuffle functionality I introduce is more generic that
vec_perm we have. So I don't see any problem that expanding
builtin_shuffle I use vec_perm? Why should user check against vec_perm
and shuffle if he just wants to shuffle?

>
> The new hooks should *only* handle the variable permutation case.

Well, again, I thought that it would be nice to bring some
generalisation of these functions. I can't understand why it is so
bad?


Artem.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]