replacing the backwards threader and more

Richard Biener richard.guenther@gmail.com
Fri Jun 25 07:58:09 GMT 2021


On Fri, Jun 25, 2021 at 9:54 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Jun 24, 2021 at 6:14 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
> >
> >
> >
> > On 6/21/2021 8:40 AM, Aldy Hernandez wrote:
> > >
> > >
> > > On 6/9/21 2:09 PM, Richard Biener wrote:
> > >> On Wed, Jun 9, 2021 at 1:50 PM Aldy Hernandez via Gcc
> > >> <gcc@gcc.gnu.org> wrote:
> > >>>
> > >>> Hi Jeff.  Hi folks.
> > >>>
> > >>> What started as a foray into severing the old (forward) threader's
> > >>> dependency on evrp, turned into a rewrite of the backwards threader
> > >>> code.  I'd like to discuss the possibility of replacing the current
> > >>> backwards threader with a new one that gets far more threads and can
> > >>> potentially subsume all threaders in the future.
> > >>>
> > >>> I won't include code here, as it will just detract from the high level
> > >>> discussion.  But if it helps, I could post what I have, which just
> > >>> needs
> > >>> some cleanups and porting to the latest trunk changes Andrew has made.
> > >>>
> > >>> Currently the backwards threader works by traversing DEF chains through
> > >>> PHIs leading to possible paths that start in a constant.  When such a
> > >>> path is found, it is checked to see if it is profitable, and if so, the
> > >>> constant path is threaded.  The current implementation is rather
> > >>> limited
> > >>> since backwards paths must end in a constant.  For example, the
> > >>> backwards threader can't get any of the tests in
> > >>> gcc.dg/tree-ssa/ssa-thread-14.c:
> > >>>
> > >>>     if (a && b)
> > >>>       foo ();
> > >>>     if (!b && c)
> > >>>       bar ();
> > >>>
> > >>> etc.
> > >>>
> > >>> After my refactoring patches to the threading code, it is now possible
> > >>> to drop in an alternate implementation that shares the profitability
> > >>> code (is this path profitable?), the jump registry, and the actual jump
> > >>> threading code.  I have leveraged this to write a ranger-based threader
> > >>> that gets every single thread the current code gets, plus 90-130% more.
> > >>>
> > >>> Here are the details from the branch, which should be very similar to
> > >>> trunk.  I'm presenting the branch numbers because they contain Andrew's
> > >>> upcoming relational query which significantly juices up the results.
> > >>>
> > >>> New threader:
> > >>>            ethread:65043    (+3.06%)
> > >>>            dom:32450      (-13.3%)
> > >>>            backwards threader:72482   (+89.6%)
> > >>>            vrp:40532      (-30.7%)
> > >>>     Total threaded:  210507 (+6.70%)
> > >>>
> > >>> This means that the new code gets 89.6% more jump threading
> > >>> opportunities than the code I want to replace.  In doing so, it reduces
> > >>> the amount of DOM threading opportunities by 13.3% and by 30.7% from
> > >>> the
> > >>> VRP jump threader.  The total  improvement across the jump threading
> > >>> opportunities in the compiler is 6.70%.
> > >>>
> > >>> However, these are pessimistic numbers...
> > >>>
> > >>> I have noticed that some of the threading opportunities that DOM and
> > >>> VRP
> > >>> now get are not because they're smarter, but because they're picking up
> > >>> opportunities that the new code exposes.  I experimented with
> > >>> running an
> > >>> iterative threader, and then seeing what VRP and DOM could actually
> > >>> get.
> > >>>    This is too expensive to do in real life, but it at least shows what
> > >>> the effect of the new code is on DOM/VRP's abilities:
> > >>>
> > >>>     Iterative threader:
> > >>>       ethread:65043    (+3.06%)
> > >>>       dom:31170    (-16.7%)
> > >>>           thread:86717    (+127%)
> > >>>           vrp:33851    (-42.2%)
> > >>>     Total threaded:  216781 (+9.90%)
> > >>>
> > >>> This means that the new code not only gets 127% more cases, but it
> > >>> reduces the DOM and VRP opportunities considerably (16.7% and 42.2%
> > >>> respectively).   The end result is that we have the possibility of
> > >>> getting almost 10% more jump threading opportunities in the entire
> > >>> compilation run.
> > >>
> > >> Yeah, DOM once was iterating ...
> > >>
> > >> You probably have noticed that we have very man (way too many)
> > >> 'thread' passes, often in close succession with each other or
> > >> DOM or VRP.  So in the above numbers I wonder if you can break
> > >> down the numbers individually for the actual passes (in their order)?
> > >
> > > As promised.
> > >
> > > *** LEGACY:
> > > ethread42:61152 30.1369% (61152 threads for 30.1% of total)
> > > thread117:29646 14.6101%
> > > vrp118:62088 30.5982%
> > > thread132:2232 1.09997%
> > > dom133:31116 15.3346%
> > > thread197:1950 0.960998%
> > > dom198:10661 5.25395%
> > > thread200:587 0.289285%
> > > vrp201:3482 1.716%
> > > Total:  202914
> >
> > > The above is from current trunk with my patches applied, defaulting to
> > > legacy mode.  It follows the pass number nomenclature in the
> > > *.statistics files.
> > >
> > > New threader code (This is what I envision current trunk to look with
> > > my patchset):
> > >
> > > *** RANGER:
> > > ethread42:64389 30.2242%
> > > thread117:49449 23.2114%
> > > vrp118:46118 21.6478%
> > > thread132:8153 3.82702%
> > > dom133:27168 12.7527%
> > > thread197:5542 2.60141%
> > > dom198:8191 3.84485%
> > > thread200:1038 0.487237%
> > > vrp201:2990 1.40351%
> > > Total:  213038
> > So this makes me think we should focus on dropping thread197, thread200,
> > & vrp201 and I'd probably focus on vrp201 first since we know we want to
> > get rid of it anyway and that may change the data for thread???.  Then
> > I'd be looking at thread200 and thread197 in that order.  I suspect that
> > at least some of the cases in thread200 and vrp201 are exposed by dom198.
>
> It might be interesting to see replacing vrp201 with evrp and run
> thread200 after
> that or alternatively move dom198 after vrp201 and then replace vrp201 with
> evrp and kill off thread200 completely.  Supposedly we will for now keep the
> forward threading in DOM.  Since we have FRE right before DOM it's
> likely only threading that DOM performs now.  (but of course testcases
> will need adjustment)

So instead of

      NEXT_PASS (pass_fre, false /* may_iterate */);
      /* After late FRE we rewrite no longer addressed locals into SSA
         form if possible.  */
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
      NEXT_PASS (pass_strlen);
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
      /* Threading can leave many const/copy propagations in the IL.
         Clean them up.  Instead of just copy_prop, we use ccp to
         compute alignment and nonzero bits.  */
      NEXT_PASS (pass_ccp, true /* nonzero_p */);

do

      NEXT_PASS (pass_fre, false /* may_iterate */);
      NEXT_PASS (pass_early_vrp);
      NEXT_PASS (pass_thread_jumps);
      NEXT_PASS (pass_strlen); /// ???
      /* Threading can leave many const/copy propagations in the IL.
         Clean them up.  Instead of just copy_prop, we use ccp to
         compute alignment and nonzero bits.  */
      NEXT_PASS (pass_ccp, true /* nonzero_p */);

where I only wonder about pass_strlen placement and how much it depends on
earlier threading (I'd just run it after FRE?)

> Richard.
>
> >
> > Jeff


More information about the Gcc mailing list