This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: [gcc3.5 improvement branch] Very Simple constant propagation



On Feb 2, 2004, at 1:03 PM, Roger Sayle wrote:



Hi Caroline,
However, I have discovered that there seems to be a major difference in
compile-time performance between them. All of these measurements were
taken using the "time" command, and done at least three times. I list
here the compile times for directly compiling the SPECInt 2000
benchmarks I measured. I compiled each benchmark
directly, using "-O3 -ffast-math -save-temps", and timed the
compilation with the 'time" command (i.e. I did NOT use
the SPEC harness to do the compilation). In addition, I passed the
"-fss-const-prop" flag to mine to turn on my version (yours is on be
default).


Benchmark My Patch Your Patch

gzip                6.65            12.74
vpr                20.83            36.98
gcc               244.13           399.41
mcf                 1.75             3.28
crafty             31.95            61.89
parser             15.61            31.29


All compile times are in seconds, averaged over three runs. As you can see, in every case your version takes significantly longer than my version.

Could you double check these results? I've just timed 3-stage bootstraps
of GCC, all languages except treelang, including building the libraries,
and I see no appreciable difference in compilation times.

To match Caroline's results, you should use --enable-intermodule and -O3, not just a regular bootstrap.


The concern is compile-time performance on large-to-very-large functions, like those created by intermodule inlining in SPEC.

Without my patch:
real    58m45.660s
user    42m24.880s
sys     8m49.510s

With it:
real    56m39.610s
user    42m50.360s
sys     8m39.570s

Even considering the 26 second increase in user-time, which I believe
is in the noise, accounts for only one 1.02% increase in compile-time.
And I think some of this could be addressed using rtx_equal_p to ignore
isns's whose REG_EQUAL note it the same as the SET_SRC.

rth suggested to me privately that it might be a good idea to also ignore REG_EQUAL notes that only specify that a register is equal to some constant, since combine already handles this.


But this does nothing to explain the 100+% slowdowns you're seeing
with your benchmarks.

Combine has multiple passes, firstly an O(N) inspection of earlier
instructions to combine two instructions together, and if that doesn't
help two O(N^2) passes, to combine three instructions, the current one
and two earlier ones.  My patch adds an additional O(N) pass to this
already O(N^2) algorithm, where N here is the number of operands in an
instruction, typically two or three.   HAVE_cc0 targets for example,
have two additional O(N) passes in combine in addition to those on
non-HAVE_cc0 targets.

This factor may make a difference; powerpc, which is what Caroline used, is not a CC0 target, and it's quite possible that combine is a significant fraction of compile time (the usual optimizer suspects are combine and gcse).


--
Geoff Keating <geoffk@apple.com>

Attachment: smime.p7s
Description: S/MIME cryptographic signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]