combine permutations in gimple

Mon Aug 13 13:10:00 GMT 2012

On Mon, Aug 13, 2012 at 3:03 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Mon, 13 Aug 2012, Richard Guenther wrote:
>
>> On Sat, Aug 11, 2012 at 2:25 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> On Fri, 10 Aug 2012, Marc Glisse wrote:
>>>
>>>> 1) I am not sure we always want to combine permutations. Indeed, someone
>>>> (user? vectorizer?) may have written 2 permutations to help the backend
>>>> generate optimal code, whereas it will do a bad job on the complicated
>>>> combined permutation. At tree level, I am not sure what can be done
>>>> except
>>>> possibly restrict the optimization to the case where the combined
>>>> permutation is the identity. At the RTL level, we could compare what
>>>> insns
>>>> are generated, but that's going to be painful (doing anything with
>>>> permutations at the RTL level is painful).
>
>
> I guess people will complain soon enough if this causes horrible performance
> regressions in vectorized code.
>
>
>>> +/* Return true if VAR has no nondebug use but in stmt.  */
>>> +static bool
>>> +is_only_used_in (const_tree var, const_gimple stmt)
>>> +{
>>> +  const ssa_use_operand_t *const ptr0 = &(SSA_NAME_IMM_USE_NODE (var));
>>> +  const ssa_use_operand_t *ptr = ptr0->next;
>>> +
>>> +  for (; ptr != ptr0; ptr = ptr->next)
>>> +    {
>>> +      const_gimple use_stmt = USE_STMT (ptr);
>>> +      if (is_gimple_debug (use_stmt))
>>> +       continue;
>>> +      if (use_stmt != stmt)
>>> +       return false;
>>> +    }
>>> +  return true;
>>> +}
>>
>>
>> if (!single_imm_use (var, &use, &use_stmt) || use_stmt != stmt)
>>
>> instead of the above.
>
>
> I don't think that works with statements that use a variable twice.
>
>
>>> +/* Combine two shuffles in a row.  Returns 1 if there were any changes
>>> +   made, 2 if cfg-cleanup needs to run.  Else it returns 0.  */
>>> +
>>> +static int
>>> +simplify_permutation (gimple_stmt_iterator *gsi)
>>> +{
>>> +  gimple stmt = gsi_stmt (*gsi);
>>> +  gimple def_stmt;
>>> +  tree op0, op1, op2, op3, mask;
>>> +  enum tree_code code = gimple_assign_rhs_code (stmt);
>>> +  enum tree_code code2;
>>> +  location_t loc = gimple_location (stmt);
>>> +
>>> +  gcc_checking_assert (code == VEC_PERM_EXPR);
>>> +
>>> +  op0 = gimple_assign_rhs1 (stmt);
>>> +  op1 = gimple_assign_rhs2 (stmt);
>>> +  op2 = gimple_assign_rhs3 (stmt);
>>> +
>>> +  if (TREE_CODE (op0) != SSA_NAME)
>>> +    return 0;
>>> +
>>> +  if (TREE_CODE (op2) != VECTOR_CST)
>>> +    return 0;
>>> +
>>> +  def_stmt = SSA_NAME_DEF_STMT (op0);
>>> +  if (!def_stmt || !is_gimple_assign (def_stmt)
>>> +      || !can_propagate_from (def_stmt)
>>> +      || !is_only_used_in (op0, stmt))
>>
>>
>> Or rather than the above, simply check
>>
>>           || !has_single_use (op0)
>>
>> here.
>
>
> Then there was my previous (non-working) patch that used
> get_prop_source_stmt.
>
>
>>> +    return 0;
>>> +
>>> +  /* Check that it is only used here. We cannot use has_single_use
>>> +     since the expression is using it twice itself...  */
>>
>>
>> Ah ... so then
>>
>>           || num_imm_uses (op0) != 2
>
>
> Ah, ok, that's simpler indeed, but there were such dire warnings to never
> use that evil function unless absolutely necessary that I didn't dare use
> it... Thanks for the permission.

If your new predicate would match more places (can you do a quick search?)
then we want it in tree-flow-inline.h instead of in
tree-ssa-forwprop.c.  But yes,
num_imm_uses can be expensive.   For now just stick with the above.

>>> +  code2 = gimple_assign_rhs_code (def_stmt);
>>> +
>>> +  /* Two consecutive shuffles.  */
>>> +  if (code2 == VEC_PERM_EXPR)
>>> +    {
>>> +      op3 = gimple_assign_rhs3 (def_stmt);
>>> +      if (TREE_CODE (op3) != VECTOR_CST)
>>> +       return 0;
>>> +      mask = fold_build3_loc (loc, VEC_PERM_EXPR, TREE_TYPE (op3),
>>> +                             op3, op3, op2);
>>> +      gcc_assert (TREE_CODE (mask) == VECTOR_CST);
>>
>>
>> which means you can use
>>
>>          mask = fold_ternary (loc, ...);
>>
>> Ok with that changes.
>
>
> Thanks a lot, I'll do that.
>
> Next step should be either BIT_FIELD_REF(VEC_PERM_EXPR) or
> VEC_PERM_EXPR(CONSTRUCTOR). Is there a good way to determine whether this
> kind of combination should be done forwards or backwards (i.e. start from
> VEC_PERM_EXPR and look if its single use is in BIT_FIELD_REF, or start from
> BIT_FIELD_REF and look if its argument is a VEC_PERM_EXPR only used here?

The natural SSA (and forwprop) way is to look for BIT_FIELD_REF and see
if the def stmt is a VEC_PERM_EXPR.

Richard.

> --
> Marc Glisse