This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: PATCH: [gcc3.5 improvement branch] Very Simple constant propagation
- From: Roger Sayle <roger at eyesopen dot com>
- To: Caroline Tice <ctice at apple dot com>
- Cc: gcc-patches at gcc dot gnu dot org, geoffrey Keating <geoffk at apple dot com>
- Date: Mon, 2 Feb 2004 14:03:17 -0700 (MST)
- Subject: Re: PATCH: [gcc3.5 improvement branch] Very Simple constant propagation
Hi Caroline,
> However, I have discovered that there seems to be a major difference in
> compile-time performance between them. All of these measurements were
> taken using the "time" command, and done at least three times. I list
> here the compile times for directly compiling the SPECInt 2000
> benchmarks I measured. I compiled each benchmark
> directly, using "-O3 -ffast-math -save-temps", and timed the
> compilation with the 'time" command (i.e. I did NOT use
> the SPEC harness to do the compilation). In addition, I passed the
> "-fss-const-prop" flag to mine to turn on my version (yours is on be
> default).
>
> Benchmark My Patch Your Patch
>
> gzip 6.65 12.74
> vpr 20.83 36.98
> gcc 244.13 399.41
> mcf 1.75 3.28
> crafty 31.95 61.89
> parser 15.61 31.29
>
>
> All compile times are in seconds, averaged over three runs. As you
> can see, in every case your version takes significantly longer than my
> version.
Could you double check these results? I've just timed 3-stage bootstraps
of GCC, all languages except treelang, including building the libraries,
and I see no appreciable difference in compilation times.
Without my patch:
real 58m45.660s
user 42m24.880s
sys 8m49.510s
With it:
real 56m39.610s
user 42m50.360s
sys 8m39.570s
Even considering the 26 second increase in user-time, which I believe
is in the noise, accounts for only one 1.02% increase in compile-time.
And I think some of this could be addressed using rtx_equal_p to ignore
isns's whose REG_EQUAL note it the same as the SET_SRC.
But this does nothing to explain the 100+% slowdowns you're seeing
with your benchmarks.
Combine has multiple passes, firstly an O(N) inspection of earlier
instructions to combine two instructions together, and if that doesn't
help two O(N^2) passes, to combine three instructions, the current one
and two earlier ones. My patch adds an additional O(N) pass to this
already O(N^2) algorithm, where N here is the number of operands in an
instruction, typically two or three. HAVE_cc0 targets for example,
have two additional O(N) passes in combine in addition to those on
non-HAVE_cc0 targets.
Note that the running time of combine is independent of the size of
basic blocks, as LOG_LINKS only point to the two or three "feeding"
instructions, independent of the maximum size basic block in the CFG.
To summarise, its impossible that my changes could have such a severe
slow-down, even if the quadratic part of combine previously accounted
for the majority of CPU cycles in a normal compile.
Is there any chance you analysed one of the trees with checking enabled
and the other with checking disabled? Or one of the patches against the
FSF tree and the other against Apple's tree? Or perhaps you misapplied
my patch, inserting my O(N) loop inside one of the existing O(N^2) loops
creating an O(N^3) loop. Have you mislabeled the columns?
Roger
--