This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom


On Wed, 23 Sep 2015, Kyrill Tkachov wrote:

> 
> On 23/09/15 11:10, Richard Biener wrote:
> > On Wed, 23 Sep 2015, Kyrill Tkachov wrote:
> > 
> > > On 23/09/15 10:09, Pinski, Andrew wrote:
> > > > > On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov <kyrylo.tkachov@arm.com>
> > > > > wrote:
> > > > > 
> > > > > 
> > > > > > On 22/09/15 20:31, Jeff Law wrote:
> > > > > > > On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
> > > > > > > Hi all,
> > > > > > > Unfortunately, I see a testsuite regression with this patch:
> > > > > > > FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"
> > > > > > > 
> > > > > > > The reduced part of that test is:
> > > > > > > void
> > > > > > > test1 (int x, unsigned u)
> > > > > > > {
> > > > > > >      if ((1U << x) != 64
> > > > > > >          || (2 << x) != u
> > > > > > >          || (x << x) != 384
> > > > > > >          || (3 << x) == 9
> > > > > > >          || (x << 14) != 98304U
> > > > > > >          || (1 << x) == 14
> > > > > > >          || (3 << 2) != 12)
> > > > > > >        __builtin_abort ();
> > > > > > > }
> > > > > > > 
> > > > > > > The patched ifcombine pass works more or less as expected and
> > > > > > > produces
> > > > > > > fewer basic blocks.
> > > > > > > Before this patch a relevant part of the ifcombine dump for test1
> > > > > > > is:
> > > > > > > ;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
> > > > > > >      if (x_1(D) != 6)
> > > > > > >        goto <bb 6>;
> > > > > > >      else
> > > > > > >        goto <bb 3>;
> > > > > > > 
> > > > > > > ;;   basic block 3, loop depth 0, count 0, freq 9996, maybe hot
> > > > > > >      _2 = 2 << x_1(D);
> > > > > > >      _3 = (unsigned intD.10) _2;
> > > > > > >      if (_3 != u_4(D))
> > > > > > >        goto <bb 6>;
> > > > > > >      else
> > > > > > >        goto <bb 4>;
> > > > > > > 
> > > > > > > 
> > > > > > > After this patch it is:
> > > > > > > ;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
> > > > > > >      _2 = 2 << x_1(D);
> > > > > > >      _3 = (unsigned intD.10) _2;
> > > > > > >      _9 = _3 != u_4(D);
> > > > > > >      _10 = x_1(D) != 6;
> > > > > > >      _11 = _9 | _10;
> > > > > > >      if (_11 != 0)
> > > > > > >        goto <bb 5>;
> > > > > > >      else
> > > > > > >        goto <bb 3>;
> > > > > > > 
> > > > > > > The second form ends up generating worse codegen however, and the
> > > > > > > badness starts with the dom1 pass.
> > > > > > > In the unpatched case it manages to deduce that x must be 6 by the
> > > > > > > time
> > > > > > > it reaches basic block 3 and
> > > > > > > uses that information to eliminate the shift in "_2 = 2 << x_1(D)"
> > > > > > > from
> > > > > > > basic block 3
> > > > > > > In the patched case it is unable to make that call, I think
> > > > > > > because
> > > > > > > the
> > > > > > > x != 6 condition is IORed
> > > > > > > with another test.
> > > > > > > 
> > > > > > > I'm not familiar with the internals of the dom pass, so I'm not
> > > > > > > sure
> > > > > > > where to go looking for a fix for this.
> > > > > > > Is the ifcombine change a step in the right direction? If so, what
> > > > > > > would
> > > > > > > need to be done to fix the issue with
> > > > > > > the dom pass?
> > > > > > I don't see how you can reasonably fix this in DOM.  if _9 or _10 is
> > > > > > true, then _11 is true.  But we can't reasonably record any kind of
> > > > > > equivalence for _9 or _10 individually.
> > > > > > 
> > > > > > If the statement
> > > > > > _11 = _9 | _10;
> > > > > > 
> > > > > > Were changed to
> > > > > > 
> > > > > > _11 = _9 & _10;
> > > > > > 
> > > > > > Then we could record something useful about _9 and _10.
> > > > > > 
> > > > > > 
> > > > > > > I suppose what we want is to not combine basic blocks if the
> > > > > > > sequence
> > > > > > > and conditions of the basic blocks are
> > > > > > > such that dom can potentially exploit them, but how do we express
> > > > > > > that?
> > > > > > I don't think there's going to be a way to directly express that.
> > > > > > You
> > > > > > could essentially claim that TRUTH_OR is more expensive than
> > > > > > TRUTH_AND
> > > > > > because of the impact on DOM, but that in and of itself may not
> > > > > > resolve
> > > > > > the situation either.
> > > > > > 
> > > > > > I think the question we need to answer is whether or not your
> > > > > > changes
> > > > > > are generally better, even if there's specific instances where they
> > > > > > make
> > > > > > things worse.  If the benefits outweigh the negatives then we can
> > > > > > xfail
> > > > > > that test.
> > > > > Ok, I'll investigate and benchmark some more.
> > > > > Andrew, this transformation to ifcombine (together with the
> > > > > restriction
> > > > > that the inner condition block
> > > > > has to be a single comparison) was added by you with r204194.
> > > > > Is there a particular reason for that restriction and why it is
> > > > > applied to
> > > > > the inner block and not either?
> > > > My reasoning at the time was there might be an "expensive" instruction
> > > > or
> > > > one that might trap (I did not check to see if the other part of the
> > > > code
> > > > was detecting that).
> > > > The outer block did not need any checks as we have something like
> > > > ...
> > > > If (a)
> > > >     If (b)
> > > > 
> > > > Or
> > > > ....
> > > > If (a)
> > > >     Goto f
> > > > else if (b)
> > > >    ....
> > > > Else
> > > > {
> > > > F:
> > > > ....
> > > > }
> > > > 
> > > > And there was no need to check what was before the if (a) part just what
> > > > is
> > > > in between the two ifs.
> > > Ah, because the code in outer_cond_bb would have to be executed anyway
> > > whether
> > > we perform the conversion or not, right?
> > All ifcombine transforms make the outer condition unconditionally
> > true/false thus the check should have been on whether the outer
> > cond BB is "empty".  Which would solve your problem, right?
> 
> I'm not sure I follow. Why does cond bb has to be empty?

Sorry, I mixed it up.  Of course the cond is at the end of cond bb...

> > 
> > Note that other transforms (bit test recognition) don't care (sth
> > we might want to fix?).
> > 
> > In general this needs a better cost function, maybe simply use
> > estimate_num_insns with speed estimates and compare against a
> > new --param.
> 
> Thanks, that looks like a starting point.
> If we were add some kind of costing check here, would we even need
> the checks mentioned above? I don't think it will affect correctness
> (the inner cond bb is checked for no side-effects before entering this
> function).

No, we'd replace the check with the cost function.

Richard.

> Thanks,
> Kyrill
> 
> > 
> > Thanks,
> > Richard.
> > 
> > > Thanks,
> > > Kyrill
> > > 
> > > > What I mean by expensive for an example is division or some function
> > > > call.
> > > > 
> > > > Thanks,
> > > > Andrew
> > > > 
> > > > 
> > > > > Thanks,
> > > > > Kyrill
> > > > > 
> > > > > 
> > > > > 
> > > > > > jeff
> > > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]