This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: [gcc3.5 improvement branch] Very Simple constant propagation


Hi Caroline,
> However, I have discovered that there seems to be a major difference in
> compile-time performance between them.  All of these measurements were
> taken using the "time" command, and done at least three times.  I list
> here the compile times for directly compiling the SPECInt 2000
> benchmarks I measured.  I compiled each benchmark
> directly, using "-O3 -ffast-math  -save-temps", and timed the
> compilation with the 'time" command (i.e. I did NOT use
> the SPEC harness to do the compilation).   In addition, I passed the
> "-fss-const-prop" flag to mine to turn on my version (yours is on be
> default).
>
> Benchmark        My Patch          Your Patch
>
> gzip                6.65            12.74
> vpr                20.83            36.98
> gcc               244.13           399.41
> mcf                 1.75             3.28
> crafty             31.95            61.89
> parser             15.61            31.29
>
>
> All compile times are in seconds, averaged over three runs.    As you
> can see, in every case your version takes significantly longer than my
> version.

Could you double check these results?  I've just timed 3-stage bootstraps
of GCC, all languages except treelang, including building the libraries,
and I see no appreciable difference in compilation times.

Without my patch:
real    58m45.660s
user    42m24.880s
sys     8m49.510s

With it:
real    56m39.610s
user    42m50.360s
sys     8m39.570s

Even considering the 26 second increase in user-time, which I believe
is in the noise, accounts for only one 1.02% increase in compile-time.
And I think some of this could be addressed using rtx_equal_p to ignore
isns's whose REG_EQUAL note it the same as the SET_SRC.


But this does nothing to explain the 100+% slowdowns you're seeing
with your benchmarks.

Combine has multiple passes, firstly an O(N) inspection of earlier
instructions to combine two instructions together, and if that doesn't
help two O(N^2) passes, to combine three instructions, the current one
and two earlier ones.  My patch adds an additional O(N) pass to this
already O(N^2) algorithm, where N here is the number of operands in an
instruction, typically two or three.   HAVE_cc0 targets for example,
have two additional O(N) passes in combine in addition to those on
non-HAVE_cc0 targets.

Note that the running time of combine is independent of the size of
basic blocks, as LOG_LINKS only point to the two or three "feeding"
instructions, independent of the maximum size basic block in the CFG.

To summarise, its impossible that my changes could have such a severe
slow-down, even if the quadratic part of combine previously accounted
for the majority of CPU cycles in a normal compile.


Is there any chance you analysed one of the trees with checking enabled
and the other with checking disabled?  Or one of the patches against the
FSF tree and the other against Apple's tree?  Or perhaps you misapplied
my patch, inserting my O(N) loop inside one of the existing O(N^2) loops
creating an O(N^3) loop.  Have you mislabeled the columns?

Roger
--


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]