This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [committed][PR rtl-optimization/87761] Limited iteration in regcprop to pick up secondary opportunities


On Thu, Mar 28, 2019 at 09:55:46AM +0100, Richard Biener wrote:
> On Wed, Mar 27, 2019 at 4:26 PM Jeff Law <law@redhat.com> wrote:
> >
> > On 3/27/19 8:36 AM, Jakub Jelinek wrote:
> > > On Sun, Mar 24, 2019 at 09:20:07AM -0600, Jeff Law wrote:
> > >> However, I'm increasingly of the opinion that MIPS targets need to drop
> > >> off the priority platform list.  Given the trajectory I see for MIPS
> > >> based processors in industry, it's really hard to justify spending this
> > >> much time on them, particularly for low priority code quality issues.
> > >
> > > Besides what has been discussed on IRC for the PR89826 fix, that we really
> > > need a df_analyze before processing the first block, because otherwise we
> > > can't rely on the REG_UNUSED notes in the IL, I see some other issues, but I
> > > admit I don't know much about df nor regcprop.
> > RIght.  I plan to commit that today along with the test reordering you
> > pointed out.
> >
> > >
> > > 1) the df_analyze () after every (successful) processing of a basic block
> > > is IMHO way too expensive, I would be very surprised if df_analyze () isn't
> > > quadratic in number of basic blocks and so one could construct testcases
> > > with millions of basic blocks and at least one regcprop change in each bb
> > > and get at cubic complexity (correct me if I'm wrong, and I'm aware of the
> > > 95% bbs you said won't have any changes at all)
> > I'm going to look this further today.
> 
> Look at https://gcc.opensuse.org/gcc-old/c++bench-czerny/random/random-performance-latest
> and you'll see multiple testcases with 'hard reg cprop' >10% compile-time.
> It's indeed a hog for no good reason.

I've tried https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071#c1
in --enable-checking=yes,rtl,extra bootstrapped cc1 at -O2, without and with
the patch.
The important times in -ftime-report with vanilla trunk:
 phase opt and generate             : 250.76 (100%)   2.00 ( 96%) 253.36 (100%)  768860 kB ( 99%)
 df live regs                       :  19.95 (  8%)   0.03 (  1%)  19.39 (  8%)       0 kB (  0%)
 df live&initialized regs           :  20.29 (  8%)   0.05 (  2%)  19.73 (  8%)       0 kB (  0%)
 df reg dead/unused notes           : 158.66 ( 63%)   0.02 (  1%) 160.12 ( 63%)    4665 kB (  1%)
 hard reg cprop                     :  21.03 (  8%)   0.01 (  0%)  21.39 (  8%)     509 kB (  0%)
 TOTAL                              : 250.85          2.09        253.57         776940 kB
(ignoring everything <2% in the first % column).
Configure with --enable-checking=release to disable checks.
With the https://gcc.gnu.org/ml/gcc-patches/2019-03/msg01335.html patch the
same testcase with -O2 -ftime-report results in identical assembly, but:
 phase opt and generate             :  28.92 (100%)   1.82 ( 95%)  30.85 ( 99%)  768882 kB ( 99%)
 CFG verifier                       :   1.66 (  6%)   0.02 (  1%)   1.69 (  5%)       0 kB (  0%)
 df live regs                       :   0.63 (  2%)   0.00 (  0%)   0.61 (  2%)       0 kB (  0%)
 df live&initialized regs           :   1.01 (  3%)   0.03 (  2%)   1.00 (  3%)       0 kB (  0%)
 df must-initialized regs           :   1.51 (  5%)   0.93 ( 48%)   2.46 (  8%)       0 kB (  0%)
 tree SSA verifier                  :   2.79 ( 10%)   0.01 (  1%)   2.78 (  9%)       0 kB (  0%)
 tree STMT verifier                 :   2.00 (  7%)   0.00 (  0%)   1.99 (  6%)       0 kB (  0%)
 dominance computation              :   0.61 (  2%)   0.00 (  0%)   0.59 (  2%)       0 kB (  0%)
 out of ssa                         :   0.61 (  2%)   0.04 (  2%)   0.65 (  2%)       1 kB (  0%)
 loop init                          :   0.58 (  2%)   0.00 (  0%)   0.63 (  2%)      38 kB (  0%)
 combiner                           :   0.44 (  2%)   0.02 (  1%)   0.47 (  2%)   17926 kB (  2%)
 integrated RA                      :   2.24 (  8%)   0.08 (  4%)   2.35 (  8%)  205177 kB ( 26%)
 LRA non-specific                   :   1.46 (  5%)   0.05 (  3%)   1.50 (  5%)   19172 kB (  2%)
 LRA create live ranges             :   1.23 (  4%)   0.00 (  0%)   1.23 (  4%)    2589 kB (  0%)
 reload CSE regs                    :   0.54 (  2%)   0.00 (  0%)   0.51 (  2%)    8456 kB (  1%)
 scheduling 2                       :   0.73 (  3%)   0.09 (  5%)   0.81 (  3%)    2715 kB (  0%)
 verify RTL sharing                 :   1.19 (  4%)   0.00 (  0%)   1.15 (  4%)       0 kB (  0%)
 TOTAL                              :  29.02          1.92         31.07         776962 kB
So 8.5x usr time speedup with that patch.

	Jakub


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]