[PATCH] Replace EVRP in DOM with ranger.

Aldy Hernandez aldyh@redhat.com
Fri Apr 29 09:52:56 GMT 2022


On Fri, Apr 29, 2022 at 9:09 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > > [Jeff, this is the same patch I sent you last week with minor tweaks
> > > to the commit message.]
> > And I just dropped it into the tester.  We should have the vast majority
> > of targets tested by this time tomorrow.
> >
> > >
> > > [Despite the verbosity of the message, this is actually a pretty
> > > straightforward patch.  It should've gone in last cycle, but there
> > > was a nagging regression I couldn't get to until after stage1
> > > had closed.]
> > You had other, non-work/gcc priorities.  No worries at missing gcc-12.
> >
> >
> > >
> > > There are 3 uses of EVRP in DOM that must be converted.
> > > Unfortunately, they need to be converted in one go, so further
> > > splitting of this patch would be problematic.
> > ACK
> >
> >
> > > There's nothing here earth shattering.  It's all pretty obvious in
> > > retrospect, but I've added a short description of each use to aid in
> > > reviewing:
> > >
> > > * Convert evrp use in cprop to ranger.
> > >
> > >    This is easy, as cprop in DOM was converted to the ranger API last
> > >    cycle, so this is just a matter of using a ranger instead of an
> > >    evrp_range_analyzer.
> > Yea.  We may have covered this before, but what does ranger do with a
> > conditional equivalence?
> >
> > ie if we have something like
> >
> > if (a == b)
> >    {
> >      blah blah
> >    }
> >
> > Within the true arm we know there's an equivalence.  *But* we don't want
> > to just blindly replace a with b or vice-versa.  The equivalence is
> > primarily useful for simplification rather than propagation.
> >
> > In fact, we every much do not want to cprop in these cases.   It rarely
> > helps in a meaningful way and there's no good way to know which is the
> > better value to use.  Canonicalization on SSA_NAMEs doesn't  really help
> > due to SSA_NAME recycling.
> >
> >
> > >
> > > * Convert evrp use in threader to ranger.
> > >
> > >    The idea here is to use the hybrid approach we used for the initial
> > >    VRP threader conversion last cycle.  The DOM threader will continue
> > >    using the forward threader infrastructure while continuing to query
> > >    DOM data structures, and only if the conditional does not relsolve,
> > >    using the ranger.  This gives us the best of both worlds, and is a
> > >    proven approach.
> > >
> > >    Furthermore, as frange and prange come live in the next cycle, we
> > >    can move away from the forward threader altogether, and just add
> > >    another backward threader.  This will not only remove the last use
> > >    of the forward threader, but will allow us to remove at least 1 or 2
> > >    threader instances.
> > Excellent.
> >
> > >
> > > * Convert conditional folding to use the method used by the ranger and
> > >    evrp.  Previously DOM was calling into the guts of
> > >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
> > >    using fold_cond() which rewrites the conditional and edges
> > >    automatically.
> > >
> > >    When legacy is removed, simplify_using_ranges will be further
> > >    cleaned up, and there will only be one entry point into simplifying
> > >    a statement.
> > Sure.
> >
> > >
> > > * DOM was setting global ranges determined from unreachable edges as a
> > >    side-effect of using the evrp engine.  We must handle these cases
> > >    before nuking evrp, and DOM seems like a good fit.  I've just moved
> > >    the snippet to DOM, but it could live anywhere else we do a DOM
> > >    walk.
> >   It was a semi-desirable to record/refine global ranges in DOM, but I'd
> > be surprised if too much code depended on that.  Presumably there's a
> > testcase in the suite which we don't handle well if we don't let DOM
> > refine/record global ranges?
> >
> > >
> > >    For the record, this is the case *vrp handled:
> > >
> > >       <bb C>:
> > >       ...
> > >       if (c_5(D) != 5)
> > >       goto <bb N>;
> > >       else
> > >       goto <bb M>;
> > >       <bb N>:
> > >       __builtin_unreachable ();
> > >       <bb M>:
> > >
> > >    If M dominates all uses of c_5, we can set the global range of c_5
> > >    to [5,5].
> > And that will only happen if M eventually loops back to the conditional,
> > right?  Otherwise all the uses wouldn't be dominated. I suspect this is
> > exceedingly rare in practice an it looks really familiar.  This is
> > coming from a testcase, right?
>
> This is how we make use of assert() when it uses __builtin_unreachable.
>
> Note the difficulty here is that we have to do this at a very specific point
> in the pass pipeline (setting the global range), since the first time we'll
> fold the if (c_5(D) != 5) it will be folded to false and the IL representation
> of the assert() is gone.
>
> So the idea was that with the IL present value-range analysis can determine
> the range omf c_5(D) on the false path but once we remove it (and we do
> want to remove it) we want to put a global range on c_5(D).
>
> That means the best of both worlds would be to detect this pattern at
> a specific point in the pass pipeline (somewhen before loop opts for sure)
> and do both - remove the IL _and_ set the global range.  The fear of
> doing this too early is of course that we lose the range by copy propagation
> or similar transforms.  The fear of doing this change at a single place only
> is that we miss it because there's a stray use before the conditional.
>
> Note DOM is also run in pass_ipa_oacc_kernels which may be too early.
>
> In the end we want to perform dead code elimination here so maybe
> DCE is the best fit to do this (but as said we want to avoid doing this too
> early).

I agree that doing both would be the ideal scenario.  However, I would
prefer this be done in a follow-up patch to minimize the amount of
stuff being changed in one go.

To provide additional background, this was being done in evrp which
runs at -O2, but was also being done at -O1 as a side-effect of DOM
using the evrp engine and exporting global ranges.  For -O2, I don't
know how much DOM was originally (prior to ranger) contributing to
these assert global ranges,  since evrp was probably picking them up
first.  However, since evrp now runs in ranger mode, this
functionality has silently moved to DOM (without us realizing) for all
compilation levels.

Do we care about this functionality at -O1?  ISTR at least one -O1
testcase failing in RTL land if exporting global ranges isn't done in
DOM.  I don't remember if it was a global range due to a
__builtin_unreachable or otherwise.

One more thing, we have batted around the idea of running a fast evrp
even at -O1 (ranger without looking at back edges, IIRC).  This would
mean that when the DOM threader becomes disentangled from DOM (when we
have frange and prange), there's very little DOM will be doing that
for instance PRE can't get.  So perhaps, if we remove DOM, the place
to put this optimization is in the fast evrp, and fold the conditional
as has been suggested?  Anyways... I'm digressing.

Doing this optimization in DOM at least keeps with GCC 12, but I'm
happy to move it to whatever is recommended.

The test case is gcc.dg/builtin-unreachable-6a.c by the way.

>
> > This is the only part of the patch that makes me go "ewwww", so I would
> > like to at least explore if we can rethink the value of that test.
> >
> > > There is a a regression on Wstringop-overflow-4.C which I'm planning
> > > on XFAILing.  It's another variant of the usual middle-end false
> > > positives: having no ranges produces no warnings, but slightly refined
> > > ranges, or worse-- isolating specific problematic cases in the
> > > threader causes flare-ups.
> > ACK.
> >
> >
> > >
> > > As an aside, as Richi has suggested, I think we should discuss
> > > restricting the threader's ability to thread highly unlikely paths.
> > > These cause no end of pain for middle-end warnings.  However,
> > > I don't know if this would conflict with path isolation for
> > > things like null dereferencing.  ISTR you were interested in this.
> > I've never though too  much about it.  You're thinking of Richi :-)
> >
> > It's a balance.  I bet if we stop threading those paths we're going to
> > see other issues start popping up like uninitialized warnings.
> >
> > It may be the case that it's really the constant propagation on an
> > isolated path that's more problematical for those warnings.  But I would
> > probably contend that what isolation is doing is showing how certain
> > constant values can flow into places where they're not expected and
> > could cause problems.  So I wouldn't necessary suggest throttling it.
> >
> > Happy to be proven wrong here!
>
> I think we need to revisit the whole costing of threading which is of course
> best done when we only have the backwards threader left.  I also always
> wanted to play with some kind of optimistic code generation & rollback
> on things like threading and unrolling, basically value-number the
> duplicated stmts on-the-fly and make it "stop" when we reach a defined
> threshold (and then throw away the result).  I never went around to
> do this for unrolling since there we can end up duplicating diverging
> control flow, but that at least doesn't happen with threading(?)

WRT stopping after a threshold is reached,
back_threader_profitablity::profitable_path_p() bails after a number
of statements is reached.  If so, the path is not even put into the
queue.  So, no rolling back would be necessary.  Is this what you
meant, or something else?  The threaders (all variants) queue things
up and modify the IL at the end of the pass.

>
> And just a note - please wait with installing this change - if approved - until
> after GCC 12 is actually released.

Indeed.  This is far too invasive for the interim.  I just sent the
patch early so Jeff could get a chance to review it before I leave on
paternity leave.  But he's also volunteered to kick it over the goal
line if we can't get it in before May 9.

Aldy



More information about the Gcc-patches mailing list