[RFA][tree-optimization/90037] Cleanup const/copies between DOM and erroneous path isolation

Richard Biener richard.guenther@gmail.com
Thu Apr 25 06:58:00 GMT 2019

On April 24, 2019 8:31:44 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 4/24/19 4:44 AM, Richard Biener wrote:
>> On Tue, Apr 23, 2019 at 4:29 PM Jeff Law <law@redhat.com> wrote:
>>> As discussed in the BZ, this patch addresses the false positive
>>> by cleaning up the const/copy propagations left in the IL between
>>> jump threading and erroneous path isolation.
>>> In the past we'd been handling this stuff with phi only cprop.  To
>>> phi only cprop work in this case we'd have to change it to scan
>>> statements within some subset of blocks where it had previously only
>>> scanned PHIs.
>>> I was concerned about the compile-time cost of the additional
>>> plus the extra pass.  So I compared that against using the lattice
>>> const/copy propagation as well as against the RPO VN bits.
>>> It turns out that the lattice based const/copy propagation does the
>>> least amount of work, closely followed by a limited RPO VN.  The
>>> improved "phi only" copy propagator being the worst.
>> Interesting.
>Definitely.  I didn't dig further than what's mentioned in the BZ.
>>> Given that we can use the lattice copy propagator by just adding the
>>> pass to passes.def whereas using the RPN VN actually requires a
>>> bit of real code (to set up the entry/exits for the relevant SEME
>>> regions), I went with the lattice copy propagator.
>>> This change adds around .4% instruction executions to my testbed of
>>> files.  It has no significant impact on the resulting code -- I see
>>> different register allocation decisions in a lot of places which
>seem to
>>> primarily result in reversing arguments to comparisons.
>> Was there a need to have two copy-prop passes in the early
>> DOM/errorneous-path removal where we previously only had
>> a single phi-only-prop pass?  Is the testcase fixed also when
>> doing copy-prop only a single time?
>The testcase is fixed with a single copyprop (lattice or RPO VN) after
>DOM but before erroneous path isolation.   I seriously considered just
>dropping the copyprop pass after erroneous path isolation.  I'm pretty
>sure it'll regress codegen quality, but it may not be too bad.
>> Also the late pass you replace should be right after VRP, not
>> after warn_restrict (that was a mistake done when adding the
>> warn_restrict pass).
>Agreed.  But ISTM we should make that an independent change.
>> The main reason I dislike this is that it is an unconditional cleanup
>> pass run even when we didn't perform any jump threading.  That's
>> of course the same case as with the existing phi-only-prop passes
>> but would have been my major argument for the SEME VN
>> (where at some cut-off of course doing a single whole-function VN
>> is cheaper than doing N SEME VNs, and I can even think of doing
>> N SEME regions at once in exchange for doing a whole-function
>> RPO order compute).
>Yea.  We're certainly doing more work in the cases where we didn't
>thread anything.  I've always wanted better ways to indicate what
>actions a pass did and using that to bypass subsequent passes if they
>weren't going to be profitable.
>>> FWIW I also considered delaying the erroneous path isolation pass. 
>>> ultimately decided against that.  My recollection is that its
>>> came from the desire to clean up those paths early enough in the
>>> pipeline so that other optimizers could do a better job.
>>> We could consider an early/late split here.  Essentially we do the
>>> optimization early, leaving enough data lying around somewhere for a
>>> late pass to look for and issue the warning.  Something like a
>>> __builtin_warning that we leave in the IL, then just before
>expanding to
>>> RTL, we scan the IL once and issue the __builtin_warnings.
>>> In this specific case the __builtin_warning would be on a path that
>>> eventually would determine was unreachable at compile time and the
>>> would be removed.  I suspect we could do something similar for other
>>> warnings coming out of the middle end.
>>> Anyway, this has been bootstrapped and regression tested on x86_64,
>>> ppc64, i686, sparc64, & ppc64le.  It's also been bootstrapped on
>>> ppc (32 bit), armeb, m68k, riscv64, mipsisa32r4, arm, sh4, & sh4eb.
>>> It's also built and regression tested all the *-elf targets in my
>>> OK for the trunk, or do we want to defer to gcc-10?
>> I like the pass removal and would say OK if you manage with
>> a 1:1 replacement (thus get rid of that extra copy-prop between
>> DOM and pass_isolate_erroneous_paths).
>That's the one we need to fix the regression.  We might be able to drop
>the one after erroneous path isolation which would keep us at the 1:1
>replacement.  I'll poke at that.

Works for me as well if testing succeeds. 



More information about the Gcc-patches mailing list