This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: A question about redundant PHI expression stmt


On Tue, 2012-02-28 at 11:21 +0100, Richard Guenther wrote:
> On Tue, Feb 28, 2012 at 9:33 AM, Jiangning Liu <jiangning.liu@arm.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jiangning Liu
> >> Sent: Tuesday, February 28, 2012 4:07 PM
> >> To: Jiangning Liu; 'William J. Schmidt'
> >> Cc: gcc@gcc.gnu.org; wschmidt@gcc.gnu.org
> >> Subject: RE: A question about redundant PHI expression stmt
> >>
> >>
> >>
> >> > -----Original Message-----
> >> > From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On Behalf
> >> Of
> >> > Jiangning Liu
> >> > Sent: Tuesday, February 28, 2012 11:19 AM
> >> > To: 'William J. Schmidt'
> >> > Cc: gcc@gcc.gnu.org; wschmidt@gcc.gnu.org
> >> > Subject: RE: A question about redundant PHI expression stmt
> >> >
> >> >
> >> >
> >> > > -----Original Message-----
> >> > > From: gcc-owner@gcc.gnu.org [mailto:gcc-owner@gcc.gnu.org] On
> >> Behalf
> >> > Of
> >> > > William J. Schmidt
> >> > > Sent: Monday, February 27, 2012 11:32 PM
> >> > > To: Jiangning Liu
> >> > > Cc: gcc@gcc.gnu.org; wschmidt@gcc.gnu.org
> >> > > Subject: Re: A question about redundant PHI expression stmt
> >> > >
> >> > > On Fri, 2012-02-24 at 16:07 +0800, Jiangning Liu wrote:
> >> > > > Hi,
> >> > > >
> >> > > > For the small case below, there are some redundant PHI expression
> >> > > stmt
> >> > > > generated, and finally cause back-end generates redundant copy
> >> > > instructions
> >> > > > due to some reasons around IRA.
> >> > > >
> >> > > > int *l, *r, *g;
> >> > > > void test_func(int n)
> >> > > > {
> >> > > >         int i;
> >> > > >         static int j;
> >> > > >         static int pos, direction, direction_pre;
> >> > > >
> >> > > >         pos = 0;
> >> > > >         direction = 1;
> >> > > >
> >> > > >         for ( i = 0; i < n; i++ )
> >> > > >         {
> >> > > >                 direction_pre = direction;
> >> > > >
> >> > > >                 for ( j = 0; j <= 400; j++ )
> >> > > >                 {
> >> > > >
> >> > > >                         if ( g[pos] == 200 )
> >> > > >                                 break;
> >> > > >
> >> > > >                         if ( direction == 0 )
> >> > > >                                 pos = l[pos];
> >> > > >                         else
> >> > > >                                 pos = r[pos];
> >> > > >
> >> > > >                         if ( pos == -1 )
> >> > > >                         {
> >> > > >                                 if ( direction_pre != direction )
> >> > > >                                         break;
> >> > > >
> >> > > >                                 pos = 0;
> >> > > >                                 direction = !direction;
> >> > > >                         }
> >> > > >
> >> > > >                 }
> >> > > >
> >> > > >                 f(g[pos]);
> >> > > >         }
> >> > > > }
> >> > > >
> >> > > > I know PR39976 has something to do with this case, and check-in
> >> > > r182140
> >> > > > caused big degradation on the real benchmark, but I'm still
> >> > confusing
> >> > > around
> >> > > > this issue.
> >> > > >
> >> > > > The procedure is like this,
> >> > > >
> >> > > > Loop store motion generates code below,
> >> > > >
> >> > > > <bb 6>:
> >> > > >   D.4084_17 = l.4_13 + D.4077_70;
> >> > > >   pos.5_18 = *D.4084_17;
> >> > > >   pos_lsm.34_103 = pos.5_18;
> >> > > >   goto <bb 8>;
> >> > > >
> >> > > > <bb 7>:
> >> > > >   D.4088_23 = r.6_19 + D.4077_70;
> >> > > >   pos.7_24 = *D.4088_23;
> >> > > >   pos_lsm.34_104 = pos.7_24;
> >> > > >
> >> > > > <bb 8>:
> >> > > >   # prephitmp.29_89 = PHI <pos.5_18(6), pos.7_24(7)>
> >> > > >   # pos_lsm.34_53 = PHI <pos_lsm.34_103(6), pos_lsm.34_104(7)>
> >> > > >   pos.2_25 = prephitmp.29_89;
> >> > > >   if (pos.2_25 == -1)
> >> > > >     goto <bb 9>;
> >> > > >   else
> >> > > >     goto <bb 20>;
> >> > > >
> >> > > > And then, copy propagation transforms block 8 it into
> >> > > >
> >> > > > <bb 8>:
> >> > > >   # prephitmp.29_89 = PHI <pos.5_18(11), pos.7_24(12)>
> >> > > >   # pos_lsm.34_53 = PHI <pos.5_18(11), pos.7_24(12)>
> >> > > >   ...
> >> > > >
> >> > > > And then, these two duplicated PHI stmts have been kept until
> >> > expand
> >> > > pass.
> >> > > >
> >> > > > In dom2, a stmt like below
> >> > > >
> >> > > >   # pos_lsm.34_54 = PHI <pos_lsm.34_53(13), 0(16)>
> >> > > >
> >> > > > is transformed into
> >> > > >
> >> > > >   # pos_lsm.34_54 = PHI <prephitmp.29_89(13), 0(16)>
> >> > > >
> >> > > > just because they are equal.
> >> > > >
> >> > > > Unfortunately, this transformation changed back-end behavior to
> >> > > generate
> >> > > > redundant copy instructions and hurt performance. This case is
> >> from
> >> > a
> >> > > real
> >> > > > benchmark and hurt performance a lot.
> >> > > >
> >> > > > Does the fix in r182140 intend to totally clean up this kind of
> >> > > redundancy?
> >> > > > Where should we get it fixed?
> >> > > >
> >> > >
> >> > > Hi, sorry not to have responded sooner -- I just now got some time
> >> to
> >> > > look at this.
> >> > >
> >> > > I compiled your code with -O3 for revisions 182139 and 182140 of
> >> GCC
> >> > > trunk, and found no difference between the code produced by the
> >> > middle
> >> > > end for the two versions.  So something else has apparently come
> >> > along
> >> > > since then that helped produce the problematic code generation for
> >> > you.
> >> > > Either that or I need to use different compile flags; you didn't
> >> > > specify
> >> > > what you used.
> >> > >
> >> > > The fix in r182140 does just what you saw in dom2:  identifies
> >> > > duplicate
> >> > > PHIs in the same block and ensures only one of them is used.  This
> >> > > actually avoids inserting extra blocks during expand in certain
> >> loop
> >> > > cases.  I am not sure why you are getting redundant copies as a
> >> > result,
> >> > > but it sounds from your comments like IRA didn't coalesce a
> >> register
> >> > > copy or something like that.  You may want to bisect revisions on
> >> the
> >> > > trunk to see where your bad code generation started to occur to get
> >> a
> >> > > better handle on what happened.
> >> > >
> >> > > As Richard said, the dom pass is likely to be removed someday,
> >> > whenever
> >> > > someone can get around to it.  My redundant-phi band-aid in dom
> >> would
> >> > > go
> >> > > away then as well, but presumably an extra pass of PRE would
> >> replace
> >> > it,
> >> > > and redundant PHIs would still be removed.
> >> > >
> >> >
> >> > Bill,
> >> >
> >> > Thanks a lot for your explanation!
> >> >
> >> > The bug could be exposed with -O2 if you apply the check-in r184435
> >> > made by Richard.
> >> >
> >> > BTW, at present is there anybody working on the NEW pass replacing
> >> dom?
> >> > If no, in short term, it would be good to still get it fixed just
> >> > because it is hurting benchmark. If Yes, may I know what that NEW
> >> pass
> >> > will be focusing on?
> >> >
> >>
> >> With dump enabled, I do see a PHI copy is identified by
> >> tree_ssa_dominator_optimize as below,
> >>
> >> Optimizing block #12
> >>
> >> LKUP STMT prephitmp.29_89 = PHI <pos.5_18, pos.7_24>
> >>           prephitmp.29_89 = PHI <pos.5_18(10), pos.7_24(11)>
> >> 2>>> STMT prephitmp.29_89 = PHI <pos.5_18, pos.7_24>
> >>           prephitmp.29_89 = PHI <pos.5_18(10), pos.7_24(11)>
> >> LKUP STMT pos_lsm.34_53 = PHI <pos.5_18, pos.7_24>
> >>           pos_lsm.34_53 = PHI <pos.5_18(10), pos.7_24(11)>
> >> FIND: prephitmp.29_89
> >> 0>>> COPY pos_lsm.34_53 = prephitmp.29_89
> >> <<<< STMT prephitmp.29_89 = PHI <pos.5_18, pos.7_24>
> >>           prephitmp.29_89 = PHI <pos.5_18(10), pos.7_24(11)>
> >> Optimizing statement if (prephitmp.29_89 == -1)
> >> LKUP STMT prephitmp.29_89 eq_expr -1
> >>           if (prephitmp.29_89 == -1)
> >>
> >> So does it mean phicprop (eliminate_degenerate_phis) needs to be
> >> improved as well?
> >>
> >
> > One more clue I find is, after dom2, prephitmp.29_89 isn't really propagated and pos_lsm.34_53 is still being used by block 15 as below, although block 15 is dominated by block 12.
> >
> > <bb 15>:
> >  # direction_lsm.33_121 = PHI <direction_lsm.33_51(14)>
> >  # j_lsm.32_122 = PHI <j_lsm.32_52(14)>
> >  # pos_lsm.34_123 = PHI <pos_lsm.34_53(14)>
> >
> > Is this the root cause that redundant PHI stmt "pos_lsm.34_53 = PHI <pos.5_18(10), pos.7_24(11)>" can't be removed?
> 
> Well, check why cprop_into_stmt does nothing here.  It should already
> work.

I think this is probably a problem with how cprop_into_successor_phis
works.  It only propagates into immediate successors of a block.  In
this case copies are propagated from bb12 into phis in bb13 and bb14 (of
which there are none).  Seems like it could propagate into phis of all
dominator children, which would catch bb15.  I haven't looked at this
part of the code in too much detail, but I'm guessing since bb15's
immediate dominator is bb12, this is the last chance to get the value
propagated into the phi.

When I get more time, I'll walk through the logic for this test to
confirm, and post here with the results.

If this is the problem, then I doubt anything will be fixed for this in
the limited time left for 4.7.  You'd want to open a missed-optimization
PR for it.

Thanks,
Bill

> 
> Richard.
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]