This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: [tree-ssa] Merge status
- From: "Paolo Bonzini" <bonzini at gnu dot org>
- To: "GCC Development" <gcc at gcc dot gnu dot org>,<dnovillo at redhat dot com>
- Date: Sat, 13 Mar 2004 15:55:12 +0100
- Subject: Re: [tree-ssa] Merge status
> For SPEC2000int, the branch is 2% behind mainline on x86. For
> SPEC2000fp, the branch is 0.4% ahead. I will post x86-64 resuts soon.
Are the performance problems on SPEC2000int (and the improvements in
SPEC2000fp) concentrated in a single test?
> Currently, bootstrap times on the branch are 14% slower than mainline
> [...] Work is underway to remove some RTL passes
It may be interesting to try without ADDRESSOF; I cannot test it fully but I
made some experiments; I simply tweaked function.c to not generate it, but Jan
has posted a polished patch which AFAIK is still waiting for review. I
successfully bootstrapped mainline without all three of ADDRESSOF generation,
CSE1 (oh yeah) and the small purge_addressof pass. This helps bootstrapping
time, but somebody should do SPEC tests to see if it causes performance
problems for the other merge criterion -- with Jan's patch, the necessary
modification is as easy as emptying rest_of_handle_cse and
rest_of_handle_addressof.
Another low-hanging candidate in this area is the extended basic block stuff
of CSE. -fno-cse-follow-jumps -fno-cse-skip-blocks (or whatever they're
named) should give a first idea of the changes in SPEC numbers, but probably
more can be gained by touching cse.c itself to remove unnecessary code and
tests.
Finally, in rest_of_handle_gcse, with -fexpensive-optimizations CSE is run
repeatedly after GCSE until no jumps change. Is this really helpful,
especially with EBB CSE disabled?
I also have a patch to remove CONSTANT_P_RTX; performance tests made on
mainline when purge_builtin_constant_p was introduced, showed that it took
about 1% of bootstrap time. However, purge_builtin_constant_p should actually
be unused on the branch since builtin_constant_p is lowered well before the
RTL expander. There still may be a very small improvement (0.5% maybe),
because it would simplify the CONSTANT_P predicate in rtl.h: a third of the 5%
improvement gained by my RTX classes patch was due to simplifying CONSTANT_P.
If you are interested in the patch I can polish it and send it on Monday or
Tuesday.
IIRC, jump bypassing takes about 2% of compile time. Actually when jump
bypassing was introduced, it sped up bootstrap because GCC's enormous
conditionals are well suited to jump bypassing; but now tree-ssa-dom should
have made it almost obsolete on the branch, shouldn't it? Again, SPEC testing
is the only possible guidance.
Hope this helps,
Paolo