Created attachment 27244 [details]
The attached test case is a conversion to C of a machine generated test case in Go (the machine generated Go source is in gcc/testsuite/go.test/test/cmpldivide1.go).
When I compile this test case without optimization on an x86_64 GNU/Linux system, it takes 5.8 seconds. When I compile it without optimization on a SPARC Solaris 2.11 system, it takes 2 minutes 32.8 seconds. The SPARC machine is slower than the x86_64 machine. But it's not that much slower.
According to -ftime-report on the SPARC system, 45% of the time is "register information" and 41% of the time is "integrated RA."
Since the file is machine generated, I would be willing to accept a 2 1/2 minute compilation time with optimization. But without optimization it is just too slow. And the discrepancy with x86_64 is extreme.
Out of curiousity I tried compiling the test case with -O2. On x86_64 it took 57.4 seconds, on SPARC it took 20 minutes 33 seconds.
I'll look at this PR in a week.
I've tried a recent trunk on gcc63 of the compiler farm with -O0. The compilation takes about 300sec. I checked also gcc-4.3 (this last version with the old RA), it takes also about 300sec. The actual old RA is slower (it takes 150sec) than IRA (it takes 55sec) but register information pass (more exactly regstat_compute_ri which is a part of DF-infrastructure) takes more time in the trunk than in gcc4.3. So my times are different what you reported. Probably it depends on a machine (gcc63 is relatively modern SPARC machine with NIAGARA processors).
After some investigation, I found that the trunk gcc calls regstat_compute_ri more than gcc-4.3. That is a result of recent addition to IRA to move some insns (a month old Bernd's patch). It is not worth to do for -O0. So I am going to switch it off and achieve the same number of regstat_compute_ri calls (2 of them) as in gcc-4.3 and that means achieving less 200sec of compilation time. (65% of previous time). I am going to submit a patch today.
The futher improvement of regstat_compute_ri is not possible because we need one call for IRA needs and one call after reload transformations (for subsequent passes). Speedup of IRA itself can have only a small impact. I don't see how it is possible. It is very simple and fast enough (3 times faster than the old RA).
One might think that not doing RA at all (setting -1 for all reg_renumber elements) could speed the case up. But this is not true. It increases reload work enormously and generates > 2-3 times more insns which will slow down the compiler even more.
So, Ian, if you need more speedup for -O0, regstat_compute_ri should be improved. But that is not my responsibility area. For me, it is strange that such simple task (which requires 1 pass of RTL) takes so much time for this case.
DF -> mine to investigate.
Date: Thu May 10 19:58:01 2012
New Revision: 187373
2012-05-10 Vladimir Makarov <firstname.lastname@example.org>
* ira.c (ira): Call find_moveable_pseudos or
move_unallocated_pseudos if only ira_conflicts_p is true.
Created attachment 27369 [details]
Compute REG_LIVE_LENGTH smarter
The way local_live is used to compute REG_LIVE_LENGTH is just stupid.
Something like this patch would be helpful (drops from 73s after Vlad's patch to 39s).
A complete version of this should also handle MW-hardregs.
At -O1, with Vlad's patch and mine applied, compile time is ~230s. The "forward prop" pass takes 18% of that time, and "alias stmt walking" takes 67%.
This is with a cross from x86_64-linux-gnu to sparc-sun-solaris2.11, trunk r187371, checking enabled.
The alias stmt walking time is because of the redundant store removal code
and because of PR52054 it has to re-do the walks (there are 7200 stores in
Date: Thu May 17 17:54:52 2012
New Revision: 187633
* regstat.c (regstat_bb_compute_ri): Take new local_live_last_luid
argument. Simplify calculation of REG_LIVE_LENGTH for regnos that
die in the basic block. Correctly top off REG_FREQ and
(regstat_compute_ri): Allocate and free local_live_last_luid.
(regstat_bb_compute_calls_crossed): Correctly top off
Fixed on trunk for GCC 4.8.