[RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

Kyrill Tkachov kyrylo.tkachov@arm.com
Wed Sep 23 10:10:00 GMT 2015


On 23/09/15 10:09, Pinski, Andrew wrote:
>> On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>
>>
>>> On 22/09/15 20:31, Jeff Law wrote:
>>>> On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
>>>> Hi all,
>>>> Unfortunately, I see a testsuite regression with this patch:
>>>> FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"
>>>>
>>>> The reduced part of that test is:
>>>> void
>>>> test1 (int x, unsigned u)
>>>> {
>>>>     if ((1U << x) != 64
>>>>         || (2 << x) != u
>>>>         || (x << x) != 384
>>>>         || (3 << x) == 9
>>>>         || (x << 14) != 98304U
>>>>         || (1 << x) == 14
>>>>         || (3 << 2) != 12)
>>>>       __builtin_abort ();
>>>> }
>>>>
>>>> The patched ifcombine pass works more or less as expected and produces
>>>> fewer basic blocks.
>>>> Before this patch a relevant part of the ifcombine dump for test1 is:
>>>> ;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
>>>>     if (x_1(D) != 6)
>>>>       goto <bb 6>;
>>>>     else
>>>>       goto <bb 3>;
>>>>
>>>> ;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
>>>>     _2 = 2 << x_1(D);
>>>>     _3 = (unsigned intD.10) _2;
>>>>     if (_3 != u_4(D))
>>>>       goto <bb 6>;
>>>>     else
>>>>       goto <bb 4>;
>>>>
>>>>
>>>> After this patch it is:
>>>> ;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
>>>>     _2 = 2 << x_1(D);
>>>>     _3 = (unsigned intD.10) _2;
>>>>     _9 = _3 != u_4(D);
>>>>     _10 = x_1(D) != 6;
>>>>     _11 = _9 | _10;
>>>>     if (_11 != 0)
>>>>       goto <bb 5>;
>>>>     else
>>>>       goto <bb 3>;
>>>>
>>>> The second form ends up generating worse codegen however, and the
>>>> badness starts with the dom1 pass.
>>>> In the unpatched case it manages to deduce that x must be 6 by the time
>>>> it reaches basic block 3 and
>>>> uses that information to eliminate the shift in "_2 = 2 << x_1(D)" from
>>>> basic block 3
>>>> In the patched case it is unable to make that call, I think because the
>>>> x != 6 condition is IORed
>>>> with another test.
>>>>
>>>> I'm not familiar with the internals of the dom pass, so I'm not sure
>>>> where to go looking for a fix for this.
>>>> Is the ifcombine change a step in the right direction? If so, what would
>>>> need to be done to fix the issue with
>>>> the dom pass?
>>> I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
>>> true, then _11 is true.  But we can't reasonably record any kind of
>>> equivalence for _9 or _10 individually.
>>>
>>> If the statement
>>> _11 = _9 | _10;
>>>
>>> Were changed to
>>>
>>> _11 = _9 & _10;
>>>
>>> Then we could record something useful about _9 and _10.
>>>
>>>
>>>> I suppose what we want is to not combine basic blocks if the sequence
>>>> and conditions of the basic blocks are
>>>> such that dom can potentially exploit them, but how do we express that?
>>> I don't think there's going to be a way to directly express that.  You
>>> could essentially claim that TRUTH_OR is more expensive than TRUTH_AND
>>> because of the impact on DOM, but that in and of itself may not resolve
>>> the situation either.
>>>
>>> I think the question we need to answer is whether or not your changes
>>> are generally better, even if there's specific instances where they make
>>> things worse.  If the benefits outweigh the negatives then we can xfail
>>> that test.
>> Ok, I'll investigate and benchmark some more.
>> Andrew, this transformation to ifcombine (together with the restriction that the inner condition block
>> has to be a single comparison) was added by you with r204194.
>> Is there a particular reason for that restriction and why it is applied to the inner block and not either?
> My reasoning at the time was there might be an "expensive" instruction or one that might trap (I did not check to see if the other part of the code was detecting that).
> The outer block did not need any checks as we have something like
> ...
> If (a)
>    If (b)
>
> Or
> ....
> If (a)
>    Goto f
> else if (b)
>   ....
> Else
> {
> F:
> ....
> }
>
> And there was no need to check what was before the if (a) part just what is in between the two ifs.

Ah, because the code in outer_cond_bb would have to be executed anyway whether
we perform the conversion or not, right?

Thanks,
Kyrill

>
> What I mean by expensive for an example is division or some function call.
>
> Thanks,
> Andrew
>
>
>> Thanks,
>> Kyrill
>>
>>
>>
>>> jeff



More information about the Gcc-patches mailing list