This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tree-ssa] Merge status


> For SPEC2000int, the branch is 2% behind mainline on x86.  For
> SPEC2000fp, the branch is 0.4% ahead.  I will post x86-64 resuts soon.

Are the performance problems on SPEC2000int (and the improvements in
SPEC2000fp) concentrated in a single test?

> Currently, bootstrap times on the branch are 14% slower than mainline
> [...] Work is underway to remove some RTL passes

It may be interesting to try without ADDRESSOF; I cannot test it fully but I
made some experiments; I simply tweaked function.c to not generate it, but Jan
has posted a polished patch which AFAIK is still waiting for review.  I
successfully bootstrapped mainline without all three of ADDRESSOF generation,
CSE1 (oh yeah) and the small purge_addressof pass.  This helps bootstrapping
time, but somebody should do SPEC tests to see if it causes performance
problems for the other merge criterion -- with Jan's patch, the necessary
modification is as easy as emptying rest_of_handle_cse and
rest_of_handle_addressof.

Another low-hanging candidate in this area is the extended basic block stuff
of CSE.  -fno-cse-follow-jumps -fno-cse-skip-blocks (or whatever they're
named) should give a first idea of the changes in SPEC numbers, but probably
more can be gained by touching cse.c itself to remove unnecessary code and
tests.

Finally, in rest_of_handle_gcse, with -fexpensive-optimizations CSE is run
repeatedly after GCSE until no jumps change.  Is this really helpful,
especially with EBB CSE disabled?

I also have a patch to remove CONSTANT_P_RTX; performance tests made on
mainline when purge_builtin_constant_p was introduced, showed that it took
about 1% of bootstrap time.  However, purge_builtin_constant_p should actually
be unused on the branch since builtin_constant_p is lowered well before the
RTL expander.  There still may be a very small improvement (0.5% maybe),
because it would simplify the CONSTANT_P predicate in rtl.h: a third of the 5%
improvement gained by my RTX classes patch was due to simplifying CONSTANT_P.
If you are interested in the patch I can polish it and send it on Monday or
Tuesday.

IIRC, jump bypassing takes about 2% of compile time.  Actually when jump
bypassing was introduced, it sped up bootstrap because GCC's enormous
conditionals are well suited to jump bypassing; but now tree-ssa-dom should
have made it almost obsolete on the branch, shouldn't it?  Again, SPEC testing
is the only possible guidance.

Hope this helps,

Paolo





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]