Bug 53125 - Very slow compilation on SPARC
Very slow compilation on SPARC
Product: gcc
Classification: Unclassified
Component: rtl-optimization
: P3 normal
: ---
Assigned To: Steven Bosscher
: alias
Depends on:
Blocks: 52357
  Show dependency treegraph
Reported: 2012-04-25 22:10 UTC by Ian Lance Taylor
Modified: 2012-05-17 17:57 UTC (History)
3 users (show)

See Also:
Target: sparc-sun-solaris2.11
Known to work:
Known to fail:
Last reconfirmed: 2012-05-10 00:00:00

Test case (144.28 KB, text/plain)
2012-04-25 22:10 UTC, Ian Lance Taylor
Compute REG_LIVE_LENGTH smarter (3.03 KB, patch)
2012-05-10 22:17 UTC, Steven Bosscher
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ian Lance Taylor 2012-04-25 22:10:51 UTC
Created attachment 27244 [details]
Test case

The attached test case is a conversion to C of a machine generated test case in Go (the machine generated Go source is in gcc/testsuite/go.test/test/cmpldivide1.go).

When I compile this test case without optimization on an x86_64 GNU/Linux system, it takes 5.8 seconds.  When I compile it without optimization on a SPARC Solaris 2.11 system, it takes 2 minutes 32.8 seconds.  The SPARC machine is slower than the x86_64 machine.  But it's not that much slower.

According to -ftime-report on the SPARC system, 45% of the time is "register information" and 41% of the time is "integrated RA."

Since the file is machine generated, I would be willing to accept a 2 1/2 minute compilation time with optimization.  But without optimization it is just too slow.  And the discrepancy with x86_64 is extreme.
Comment 1 Ian Lance Taylor 2012-04-25 22:34:17 UTC
Out of curiousity I tried compiling the test case with -O2.  On x86_64 it took 57.4 seconds, on SPARC it took 20 minutes 33 seconds.
Comment 2 Vladimir Makarov 2012-04-29 00:08:54 UTC
I'll look at this PR in a week.
Comment 3 Vladimir Makarov 2012-05-10 18:30:19 UTC
  I've tried a recent trunk on gcc63 of the compiler farm with -O0.  The compilation takes about 300sec.  I checked also gcc-4.3 (this last version with the old RA), it takes also about 300sec.  The actual old RA is slower (it takes 150sec) than IRA (it takes 55sec) but register information pass (more exactly regstat_compute_ri which is a part of DF-infrastructure) takes more time in the trunk than in gcc4.3.  So my times are different what you reported.  Probably it depends on a machine (gcc63 is relatively modern SPARC machine with NIAGARA processors).

  After some investigation, I found that the trunk gcc calls regstat_compute_ri more than gcc-4.3.  That is a result of recent addition to IRA to move some insns (a month old Bernd's patch).  It is not worth to do for -O0.  So I am going to switch it off and achieve the same number of regstat_compute_ri calls (2 of them) as in gcc-4.3 and that means achieving less 200sec of compilation time. (65% of previous time).  I am going to submit a patch today.

  The futher improvement of regstat_compute_ri is not possible because we need one call for IRA needs and one call after reload transformations (for subsequent passes).  Speedup of IRA itself can have only a small impact.  I don't see how it is possible.  It is very simple and fast enough (3 times faster than the old RA).

  One might think that not doing RA at all (setting -1 for all reg_renumber elements) could speed the case up.  But this is not true.  It increases reload work enormously and generates > 2-3 times more insns which will slow down the compiler even more.

  So, Ian, if you need more speedup for -O0, regstat_compute_ri should be improved.  But that is not my responsibility area.  For me, it is strange that such simple task (which requires 1 pass of RTL) takes so much time for this case.
Comment 4 Steven Bosscher 2012-05-10 18:37:11 UTC
DF -> mine to investigate.
Comment 5 Vladimir Makarov 2012-05-10 19:58:09 UTC
Author: vmakarov
Date: Thu May 10 19:58:01 2012
New Revision: 187373

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187373
2012-05-10  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/53125
	* ira.c (ira): Call find_moveable_pseudos or
	move_unallocated_pseudos if only ira_conflicts_p is true.

Comment 6 Steven Bosscher 2012-05-10 22:17:36 UTC
Created attachment 27369 [details]
Compute REG_LIVE_LENGTH smarter

The way local_live is used to compute REG_LIVE_LENGTH is just stupid.

Something like this patch would be helpful (drops from 73s after Vlad's patch to 39s).

A complete version of this should also handle MW-hardregs.
Comment 7 Steven Bosscher 2012-05-11 09:17:44 UTC
At -O1, with Vlad's patch and mine applied, compile time is ~230s. The "forward prop" pass takes 18% of that time, and "alias stmt walking" takes 67%.

This is with a cross from x86_64-linux-gnu to sparc-sun-solaris2.11, trunk r187371, checking enabled.
Comment 8 Richard Biener 2012-05-11 09:33:19 UTC
The alias stmt walking time is because of the redundant store removal code
and because of PR52054 it has to re-do the walks (there are 7200 stores in
the function).
Comment 9 Steven Bosscher 2012-05-17 17:55:01 UTC
Author: steven
Date: Thu May 17 17:54:52 2012
New Revision: 187633

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187633
	PR rtl-optimization/53125
	* regstat.c (regstat_bb_compute_ri): Take new local_live_last_luid
	argument.  Simplify calculation of REG_LIVE_LENGTH for regnos that
	die in the basic block.  Correctly top off REG_FREQ and
	Remove do_not_gen.
	(regstat_compute_ri): Allocate and free local_live_last_luid.
	Remove do_not_gen.
	(regstat_bb_compute_calls_crossed): Correctly top off

Comment 10 Steven Bosscher 2012-05-17 17:57:05 UTC
Fixed on trunk for GCC 4.8.