This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: 3.2 vs 3.1 speed
Jan Hubicka <jh@suse.cz> writes:
> > In message <200206051353.aa01413@gremlin-relay.ics.uci.edu>, Dan Nicolaescu wri
> > tes:
> > >
> > > One of the reasons that 3.2 is slower than 3.1 is that 3.2 just
> > > generates a lot more RTL (also see the huge increase in time spent in
> > > expand).
> > Which is rather curious. What would be even more interesting would be to know
> > how much of that RTL is due to the RTL inliner and how much is due to the
> > tree inliner.
>
> Is the RTL inliner still used? I don't think so.
That was my impression too. (as a reminder: these results are for C++ code)
> There has been changes to the inlining heuristics, so I guess that will
> explain the degradation.
It looks like that change has increased the amount of generated code a
lot. Has it been benchmarked ?
Maybe we should ask the C++ people to run some performance tests to
see if it was worthwhile...
> Would be possible to know the full -fno-inline comparisons?
Sure.
Again, 3.2 used here was last updated on: Sat May 25 07:22:53 GMT 2002
3.2 -O2 -fno-inline 3.1 -O2 -fno-inline
Execution times (seconds)
garbage collection : 29.76 (26%) garbage collection : 29.73 (26%)
cfg construction : 0.43 ( 0%) cfg construction : 0.69 ( 1%)
cfg cleanup : 0.56 ( 0%) cfg cleanup : 0.52 ( 0%)
trivially dead code : 0.81 ( 1%)
life analysis : 6.27 ( 6%) life analysis : 6.09 ( 5%)
life info update : 1.72 ( 2%) life info update : 1.62 ( 1%)
preprocessing : 1.00 ( 1%) preprocessing : 1.13 ( 1%)
lexical analysis : 1.51 ( 1%) lexical analysis : 1.71 ( 1%)
parser : 36.07 (32%) parser : 37.67 (33%)
expand : 1.61 ( 1%) expand : 1.49 ( 1%)
varconst : 0.62 ( 1%) varconst : 0.70 ( 1%)
integration : 0.13 ( 0%) integration : 0.21 ( 0%)
jump : 0.64 ( 1%) jump : 0.74 ( 1%)
CSE : 7.79 ( 7%) CSE : 8.68 ( 8%)
global CSE : 1.07 ( 1%) global CSE : 1.09 ( 1%)
loop analysis : 0.32 ( 0%) loop analysis : 0.40 ( 0%)
CSE 2 : 3.39 ( 3%) CSE 2 : 3.81 ( 3%)
branch prediction : 1.50 ( 1%)
flow analysis : 0.33 ( 0%) flow analysis : 0.42 ( 0%)
combiner : 0.77 ( 1%) combiner : 0.93 ( 1%)
if-conversion : 0.03 ( 0%) if-conversion : 0.06 ( 0%)
regmove : 0.38 ( 0%) regmove : 0.45 ( 0%)
scheduling : 3.45 ( 3%) scheduling : 3.29 ( 3%)
local alloc : 1.26 ( 1%) local alloc : 1.30 ( 1%)
global alloc : 2.45 ( 2%) global alloc : 2.49 ( 2%)
reload CSE regs : 1.97 ( 2%) reload CSE regs : 3.22 ( 3%)
flow 2 : 0.31 ( 0%) flow 2 : 0.20 ( 0%)
if-conversion 2 : 0.04 ( 0%) if-conversion 2 : 0.03 ( 0%)
peephole 2 : 0.66 ( 1%) peephole 2 : 0.61 ( 1%)
rename registers : 1.98 ( 2%) rename registers : 1.94 ( 2%)
scheduling 2 : 1.46 ( 1%) scheduling 2 : 1.58 ( 1%)
delay branch sched : 0.52 ( 0%) delay branch sched : 0.59 ( 1%)
reorder blocks : 0.06 ( 0%) reorder blocks : 0.03 ( 0%)
shorten branches : 0.10 ( 0%) shorten branches : 0.09 ( 0%)
final : 0.66 ( 1%) final : 0.60 ( 1%)
symout : 0.06 ( 0%) symout : 0.06 ( 0%)
rest of compilation : 0.85 ( 1%) rest of compilation : 0.90 ( 1%)
TOTAL : 112.58 TOTAL : 115.11