This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [committed][PR rtl-optimization/87761] Limited iteration in regcprop to pick up secondary opportunities
- From: Jakub Jelinek <jakub at redhat dot com>
- To: Richard Biener <richard dot guenther at gmail dot com>
- Cc: Jeff Law <law at redhat dot com>, gcc-patches <gcc-patches at gcc dot gnu dot org>
- Date: Thu, 28 Mar 2019 14:47:08 +0100
- Subject: Re: [committed][PR rtl-optimization/87761] Limited iteration in regcprop to pick up secondary opportunities
- References: <0e195e6c-5e41-34c8-dd6c-b6722845a3a3@redhat.com> <20190327143603.GH7611@tucnak> <22b7b7a5-9b83-d806-39a6-f67ae0cd6bff@redhat.com> <CAFiYyc3XS2a6XNnufoa8dcgqKt=XjPDkU5Td6tXEW+iENoF41w@mail.gmail.com>
- Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Thu, Mar 28, 2019 at 09:55:46AM +0100, Richard Biener wrote:
> On Wed, Mar 27, 2019 at 4:26 PM Jeff Law <law@redhat.com> wrote:
> >
> > On 3/27/19 8:36 AM, Jakub Jelinek wrote:
> > > On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote:
> > >> However, I'm increasingly of the opinion that MIPS targets need to drop
> > >> off the priority platform list. Given the trajectory I see for MIPS
> > >> based processors in industry, it's really hard to justify spending this
> > >> much time on them, particularly for low priority code quality issues.
> > >
> > > Besides what has been discussed on IRC for the PR89826 fix, that we really
> > > need a df_analyze before processing the first block, because otherwise we
> > > can't rely on the REG_UNUSED notes in the IL, I see some other issues, but I
> > > admit I don't know much about df nor regcprop.
> > RIght. I plan to commit that today along with the test reordering you
> > pointed out.
> >
> > >
> > > 1) the df_analyze () after every (successful) processing of a basic block
> > > is IMHO way too expensive, I would be very surprised if df_analyze () isn't
> > > quadratic in number of basic blocks and so one could construct testcases
> > > with millions of basic blocks and at least one regcprop change in each bb
> > > and get at cubic complexity (correct me if I'm wrong, and I'm aware of the
> > > 95% bbs you said won't have any changes at all)
> > I'm going to look this further today.
>
> Look at https://gcc.opensuse.org/gcc-old/c++bench-czerny/random/random-performance-latest
> and you'll see multiple testcases with 'hard reg cprop' >10% compile-time.
> It's indeed a hog for no good reason.
I've tried https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071#c1
in --enable-checking=yes,rtl,extra bootstrapped cc1 at -O2, without and with
the patch.
The important times in -ftime-report with vanilla trunk:
phase opt and generate : 250.76 (100%) 2.00 ( 96%) 253.36 (100%) 768860 kB ( 99%)
df live regs : 19.95 ( 8%) 0.03 ( 1%) 19.39 ( 8%) 0 kB ( 0%)
df live&initialized regs : 20.29 ( 8%) 0.05 ( 2%) 19.73 ( 8%) 0 kB ( 0%)
df reg dead/unused notes : 158.66 ( 63%) 0.02 ( 1%) 160.12 ( 63%) 4665 kB ( 1%)
hard reg cprop : 21.03 ( 8%) 0.01 ( 0%) 21.39 ( 8%) 509 kB ( 0%)
TOTAL : 250.85 2.09 253.57 776940 kB
(ignoring everything <2% in the first % column).
Configure with --enable-checking=release to disable checks.
With the https://gcc.gnu.org/ml/gcc-patches/2019-03/msg01335.html patch the
same testcase with -O2 -ftime-report results in identical assembly, but:
phase opt and generate : 28.92 (100%) 1.82 ( 95%) 30.85 ( 99%) 768882 kB ( 99%)
CFG verifier : 1.66 ( 6%) 0.02 ( 1%) 1.69 ( 5%) 0 kB ( 0%)
df live regs : 0.63 ( 2%) 0.00 ( 0%) 0.61 ( 2%) 0 kB ( 0%)
df live&initialized regs : 1.01 ( 3%) 0.03 ( 2%) 1.00 ( 3%) 0 kB ( 0%)
df must-initialized regs : 1.51 ( 5%) 0.93 ( 48%) 2.46 ( 8%) 0 kB ( 0%)
tree SSA verifier : 2.79 ( 10%) 0.01 ( 1%) 2.78 ( 9%) 0 kB ( 0%)
tree STMT verifier : 2.00 ( 7%) 0.00 ( 0%) 1.99 ( 6%) 0 kB ( 0%)
dominance computation : 0.61 ( 2%) 0.00 ( 0%) 0.59 ( 2%) 0 kB ( 0%)
out of ssa : 0.61 ( 2%) 0.04 ( 2%) 0.65 ( 2%) 1 kB ( 0%)
loop init : 0.58 ( 2%) 0.00 ( 0%) 0.63 ( 2%) 38 kB ( 0%)
combiner : 0.44 ( 2%) 0.02 ( 1%) 0.47 ( 2%) 17926 kB ( 2%)
integrated RA : 2.24 ( 8%) 0.08 ( 4%) 2.35 ( 8%) 205177 kB ( 26%)
LRA non-specific : 1.46 ( 5%) 0.05 ( 3%) 1.50 ( 5%) 19172 kB ( 2%)
LRA create live ranges : 1.23 ( 4%) 0.00 ( 0%) 1.23 ( 4%) 2589 kB ( 0%)
reload CSE regs : 0.54 ( 2%) 0.00 ( 0%) 0.51 ( 2%) 8456 kB ( 1%)
scheduling 2 : 0.73 ( 3%) 0.09 ( 5%) 0.81 ( 3%) 2715 kB ( 0%)
verify RTL sharing : 1.19 ( 4%) 0.00 ( 0%) 1.15 ( 4%) 0 kB ( 0%)
TOTAL : 29.02 1.92 31.07 776962 kB
So 8.5x usr time speedup with that patch.
Jakub