[patch tree-ssa-forwprop]: Add type raising in shift-operations

Wed Nov 27 12:32:00 GMT 2013

On Tue, Nov 26, 2013 at 10:42 PM, Jeff Law <law@redhat.com> wrote:
> On 11/26/13 02:58, Richard Biener wrote:
>>>
>>> What is the rationale behind one for over the other?  Or is it arbitrary?
>>> I
>>> can easily make a case for either form based on what we're trying to
>>> optimize.  In general, it seems to me that....
>>
>>
>> When (Type) is a widening cast then you have to worry to not introduce
>> undefined behavior when y is larger than the number of bits in X.
>> When (Type) is a shortening cast then we lose information - because
>> then the range we can determine for y from the shift operation will
>> be larger.
>>
>> So I'm not sure I follow the reasoning to move out the casting when
>> possible.
>
> Assume that we're not introducing undefined behaviour.  Obviously if we're
> introducing undefined behaviour, then the transformation is wrong, plain and
> simple.
>
> If by moving the cast we can eliminate an ALU operation because the shift
> (in this example) combines with something earlier or later in the IL, then
> we win.  Similarly if by moving the cast we expose redundant casting, then
> we win.
>
> A narrower range is obviously good, but often just having a narrower range
> won't in and of itself provide an optimization opportunity.
>
>
>
>
>> The goal for canonicalization should be to shorten operations where
>> possible.  Thus both cases, sinking and hoisting the cast should
>> be applied dependent on validity and whether the shift will be performed
>> in a smaller type in the end.
>
> I disagree, there are going to be times when sinking or hoisting the cast
> allows the shift to combine with other instructions in the IL. There are
> times when the ability to combine with other instructions in the stream
> should be driving these decisions.
>
> I would buy that as a guiding principle that applies in cases where we don't
> have other optimization opportunities we can expose by hoisting/sinking the
> type casts.  I would even buy that we prefer the narrowest range through
> some point in the compiler (say vrp2), then we allow wider ranges as needed
> to facilitate other optimizations.
>
>
>
>
>>
>>> If we're more interested in combining the shift with statements that
>>> define
>>> the shift's operands, then the former is going to be a better
>>> representation
>>> because we're typecasting the result and thus the typecast doesn't
>>> interfere
>>> with the desired optimization.
>>>
>>> In contrast if we are more interested in cases where the shift defines
>>> operands for some other statement, then the latter form is more
>>> beneficial
>>> because we're typecasting the inputs and thus the typecast doesn't
>>> interfere
>>> with the desired optimization.
>>
>>
>> Always a trade-off which is why combining passes always have to
>> consider casts.
>
> Actually, I would say quite the opposite here.  There's a very clear line
> when we want to go from one representation to the other within the forwprop
> passes.  By doing so, we relieve the pattern matching aspects of that pass
> from needing to concern themselves with the typecasting issues.
>
>
>>
>> I think the goal of "canonicalizing" operations to work on the smallest
>> mode possible during early GIMPLE passes makes the most sense
>> (even if then a cast is not always outside or inside an operation).
>>
>> Later (currently during RTL expansion) we can widen operations again
>> according to target needs.
>
> But what you end up missing here is optimization opportunities that expose
> themselves when the casts are moved from inputs to outputs or vice-versa.

Only if the "optimization opportunities" do not handle the casts themselves.
We for example have the get_unwidened () helper in fold that helps
transforms to look through some casts.  I think we want similar helpers
for the GIMPLE level combining - the combiner knows whether for
an operand with type T it's ok to have a wider or narrower operand
or if it wants an exact T2 operand instead.  Helpers should exist
that get it at that, if possible.

Currently the issue would be how to represent the result of "get it at that",
but I'll work on removing the tree-building-and-re-gimplifying way of
middle-end expression creation for the next stage1 so that we can easily
represent multiple-stmts intermediate "expressions".  It'll be a
(maybe wrapped in some C++ sugar, maybe then with non-GC
allocation) gimple_seq.

So you do

  op = get_me_the_op_in_type_X (X, gimple_assign_rhs2 (stmt));
  if (!op)
    return false;

and now op is just a regular op you can work with, eventually pointing
to a temporary sequence that you'd need to insert.

Note that forwprops combining should really work like fold - combine
from the expression leafs to the expression roots (works currently
by visiting in stmt order).  Now consider (T2)(T1)(a << 5) and
((T1)(a << 5)) >> 5.  The 2nd form would prefer if we'd have "combined"
the leaf (T1)(a << 5) to ((T1)a << 5) while the first would be pessimized
by that.  So - what kind of form do you prefer?  I'd say you can't
tell that from the use(s) but you have to be able to decide without
and thus cannot decide based on "may I better be able to combine this".
Instead the combining code needs to deal with both forms.

Richard.

>
>
>>
>> Shorter operations simply carry more implicit information on value ranges
>> (and are usually easier to vectorize for example).
>
> Yes.  There's no argument about that.
>
> jeff
>