combine permutations in gimple
Richard Guenther
richard.guenther@gmail.com
Mon Aug 13 13:10:00 GMT 2012
On Mon, Aug 13, 2012 at 3:03 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Mon, 13 Aug 2012, Richard Guenther wrote:
>
>> On Sat, Aug 11, 2012 at 2:25 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> On Fri, 10 Aug 2012, Marc Glisse wrote:
>>>
>>>> 1) I am not sure we always want to combine permutations. Indeed, someone
>>>> (user? vectorizer?) may have written 2 permutations to help the backend
>>>> generate optimal code, whereas it will do a bad job on the complicated
>>>> combined permutation. At tree level, I am not sure what can be done
>>>> except
>>>> possibly restrict the optimization to the case where the combined
>>>> permutation is the identity. At the RTL level, we could compare what
>>>> insns
>>>> are generated, but that's going to be painful (doing anything with
>>>> permutations at the RTL level is painful).
>
>
> I guess people will complain soon enough if this causes horrible performance
> regressions in vectorized code.
>
>
>>> +/* Return true if VAR has no nondebug use but in stmt. */
>>> +static bool
>>> +is_only_used_in (const_tree var, const_gimple stmt)
>>> +{
>>> + const ssa_use_operand_t *const ptr0 = &(SSA_NAME_IMM_USE_NODE (var));
>>> + const ssa_use_operand_t *ptr = ptr0->next;
>>> +
>>> + for (; ptr != ptr0; ptr = ptr->next)
>>> + {
>>> + const_gimple use_stmt = USE_STMT (ptr);
>>> + if (is_gimple_debug (use_stmt))
>>> + continue;
>>> + if (use_stmt != stmt)
>>> + return false;
>>> + }
>>> + return true;
>>> +}
>>
>>
>> if (!single_imm_use (var, &use, &use_stmt) || use_stmt != stmt)
>>
>> instead of the above.
>
>
> I don't think that works with statements that use a variable twice.
>
>
>>> +/* Combine two shuffles in a row. Returns 1 if there were any changes
>>> + made, 2 if cfg-cleanup needs to run. Else it returns 0. */
>>> +
>>> +static int
>>> +simplify_permutation (gimple_stmt_iterator *gsi)
>>> +{
>>> + gimple stmt = gsi_stmt (*gsi);
>>> + gimple def_stmt;
>>> + tree op0, op1, op2, op3, mask;
>>> + enum tree_code code = gimple_assign_rhs_code (stmt);
>>> + enum tree_code code2;
>>> + location_t loc = gimple_location (stmt);
>>> +
>>> + gcc_checking_assert (code == VEC_PERM_EXPR);
>>> +
>>> + op0 = gimple_assign_rhs1 (stmt);
>>> + op1 = gimple_assign_rhs2 (stmt);
>>> + op2 = gimple_assign_rhs3 (stmt);
>>> +
>>> + if (TREE_CODE (op0) != SSA_NAME)
>>> + return 0;
>>> +
>>> + if (TREE_CODE (op2) != VECTOR_CST)
>>> + return 0;
>>> +
>>> + def_stmt = SSA_NAME_DEF_STMT (op0);
>>> + if (!def_stmt || !is_gimple_assign (def_stmt)
>>> + || !can_propagate_from (def_stmt)
>>> + || !is_only_used_in (op0, stmt))
>>
>>
>> Or rather than the above, simply check
>>
>> || !has_single_use (op0)
>>
>> here.
>
>
> Then there was my previous (non-working) patch that used
> get_prop_source_stmt.
>
>
>>> + return 0;
>>> +
>>> + /* Check that it is only used here. We cannot use has_single_use
>>> + since the expression is using it twice itself... */
>>
>>
>> Ah ... so then
>>
>> || num_imm_uses (op0) != 2
>
>
> Ah, ok, that's simpler indeed, but there were such dire warnings to never
> use that evil function unless absolutely necessary that I didn't dare use
> it... Thanks for the permission.
If your new predicate would match more places (can you do a quick search?)
then we want it in tree-flow-inline.h instead of in
tree-ssa-forwprop.c. But yes,
num_imm_uses can be expensive. For now just stick with the above.
>>> + code2 = gimple_assign_rhs_code (def_stmt);
>>> +
>>> + /* Two consecutive shuffles. */
>>> + if (code2 == VEC_PERM_EXPR)
>>> + {
>>> + op3 = gimple_assign_rhs3 (def_stmt);
>>> + if (TREE_CODE (op3) != VECTOR_CST)
>>> + return 0;
>>> + mask = fold_build3_loc (loc, VEC_PERM_EXPR, TREE_TYPE (op3),
>>> + op3, op3, op2);
>>> + gcc_assert (TREE_CODE (mask) == VECTOR_CST);
>>
>>
>> which means you can use
>>
>> mask = fold_ternary (loc, ...);
>>
>> Ok with that changes.
>
>
> Thanks a lot, I'll do that.
>
> Next step should be either BIT_FIELD_REF(VEC_PERM_EXPR) or
> VEC_PERM_EXPR(CONSTRUCTOR). Is there a good way to determine whether this
> kind of combination should be done forwards or backwards (i.e. start from
> VEC_PERM_EXPR and look if its single use is in BIT_FIELD_REF, or start from
> BIT_FIELD_REF and look if its argument is a VEC_PERM_EXPR only used here?
The natural SSA (and forwprop) way is to look for BIT_FIELD_REF and see
if the def stmt is a VEC_PERM_EXPR.
Richard.
> --
> Marc Glisse
More information about the Gcc-patches
mailing list