Bug 23835 - [4.6/4.7/4.8 Regression] -O3 compile takes two times longer
Summary: [4.6/4.7/4.8 Regression] -O3 compile takes two times longer
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.1.0
: P4 normal
Target Milestone: 4.6.4
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog
Depends on:
Blocks:
 
Reported: 2005-09-12 16:30 UTC by David Jaffe
Modified: 2012-11-09 22:46 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail: 4.0.4
Last reconfirmed: 2005-09-14 02:49:56


Attachments
preprocessed source code (148.95 KB, application/x-gzip)
2005-09-12 16:33 UTC, David Jaffe
Details
Output of gcc -O3 -ftime-report -c test.ii, using gcc 4.1.0 (1.81 KB, text/plain)
2005-09-13 22:15 UTC, David Jaffe
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Jaffe 2005-09-12 16:30:14 UTC
We observe that on ia64,
   gcc -On -c test.ii (n = 1,2,3)
is several times slower under gcc 4.1.0 than gcc 3.4.3:

Compile time in seconds

         -O0     -O1    -O2     -O3

3.4.3   5.659   9.515  13.811  14.779
4.1.0   8.417  44.652  56.176  60.204

This is typical of what we observe for compiles of long files.

Hardware: IA-64 (Itanium 2), 1600 MHz, running Linux 2.6.12.2n.

% gcc -v
Reading specs from /util/bin/../lib/gcc/ia64-unknown-linux-gnu/3.4.3/specs
Configured with: ../gcc-3.4.3/configure --prefix=/mnt/util/ia64
Thread model: posix
gcc version 3.4.3

% gcc -v
Using built-in specs.
Target: ia64-unknown-linux-gnu
Configured with: ../configure --prefix=/wga1/gcc
Thread model: posix
gcc version 4.1.0 20050730 (experimental)
Comment 1 Andrew Pinski 2005-09-12 16:32:02 UTC
(In reply to comment #0)
> % gcc -v
> Using built-in specs.
> Target: ia64-unknown-linux-gnu
> Configured with: ../configure --prefix=/wga1/gcc
> Thread model: posix
> gcc version 4.1.0 20050730 (experimental)

You are compiling with checking enabled.  Try to configure with --enable-checking=release and try 
again.
Comment 2 David Jaffe 2005-09-12 16:33:57 UTC
Created attachment 9711 [details]
preprocessed source code
Comment 3 David Jaffe 2005-09-13 22:03:29 UTC
We recompiled gcc 4.1.0 with checking disabled.  The results are now less 
dramatic but still of concern: optimized 4.1.0 compiles take about twice as 
long as 3.4.3 compiles on the test case:

Compile time in seconds

         -O0     -O1    -O2     -O3

3.4.3   5.659   9.515  13.811  14.779
4.1.0   5.863  22.025  32.208  32.611

% gcc -v
Using built-in specs.
Target: ia64-unknown-linux-gnu
Configured with: ../configure --prefix=/wga1/gcc --enable-checking=release
Thread model: posix
gcc version 4.1.0 20050730 (experimental)
Comment 4 Andrew Pinski 2005-09-13 22:09:23 UTC
Can you run with -ftime-report and attach the results?
Comment 5 David Jaffe 2005-09-13 22:15:33 UTC
Created attachment 9725 [details]
Output of  gcc -O3 -ftime-report -c test.ii, using gcc 4.1.0
Comment 6 Andrew Pinski 2005-09-13 22:18:49 UTC
 tree alias analysis   :   5.56 (17%) usr   0.07 ( 8%) sys   5.66 (17%) wall   13812 kB ( 3%) ggc
 tree SSA incremental  :   3.15 (10%) usr   0.01 ( 2%) sys   3.18 (10%) wall    8152 kB ( 2%) ggc


hmm, we most likely just have too many VOPS but I don't know for sure yet.
Comment 7 Andrew Pinski 2005-09-13 22:37:39 UTC
The main function is huge, no wonder this takes more time.
Comment 8 Andrew Pinski 2005-09-13 23:13:07 UTC
We have at least 50000 SSA_NAMEs, that is just huge.  We have only V_MAY_DEFs for .GLOBAL_VAR and 
a TMT.

This happens on all targets and not just ia64.
Comment 9 Andrew Pinski 2005-09-13 23:29:20 UTC
(In reply to comment #8)
> We have at least 50000 SSA_NAMEs, that is just huge.  We have only V_MAY_DEFs for .GLOBAL_VAR
> and  a TMT.
Over half, 40,000 are scalars registers.
Comment 10 Andrew Pinski 2005-09-14 00:32:13 UTC
[20:26] < pinskia> but 38% are in compute_may_aliases
[20:27] < pinskia> and 80% of that is in the loop which is going through all SSA_NAMES
[20:27] < pinskia> this is in create_name_tags
[20:28] < pinskia> it is O(n^2) 
Comment 11 Andrew Pinski 2005-09-14 00:35:15 UTC
Hmm, in 4.0.0, we take a nice time at -O3.
may_alias took only:
 tree alias analysis   :   0.28 ( 3%) usr   0.01 ( 1%) sys   0.21 ( 2%) wall
Total time:
 TOTAL                 :   8.55             1.04             9.75
Comment 12 Andrew Pinski 2005-09-14 00:42:19 UTC
Hmm, in 4.0.0, we have about 6,800 SSA_NAMEs.
Comment 13 Daniel Berlin 2005-09-14 02:49:56 UTC
I have a patch for the alias portion of this
Comment 14 GCC Commits 2005-09-15 01:28:34 UTC
Subject: Bug 23835

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	dberlin@gcc.gnu.org	2005-09-15 01:28:14

Modified files:
	gcc            : ChangeLog tree-ssa-alias.c 

Log message:
	2005-09-14  Daniel Berlin  <dberlin@dberlin.org>
	
	PR tree-optimization/23835
	* tree-ssa-alias.c (sort_pointers_by_pt_vars): New function.
	(create_name_tags): Rewrite to be not O(num_ssa_names^2).

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.9957&r2=2.9958
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/tree-ssa-alias.c.diff?cvsroot=gcc&r1=2.109&r2=2.110

Comment 15 Andrew Pinski 2005-09-15 16:40:35 UTC
This is much better now but still more improvements can happen.  I am going to unassign Daniel for 
now as the aliasing pass takes much less now.
Comment 16 Andrew Pinski 2005-09-15 16:49:22 UTC
on x86_64, there is only about 2x compile time increase at -O3.  Though it is much faster now, than it 
was a couple of days ago.
Comment 17 Andrew Pinski 2005-09-15 16:50:02 UTC
Looking at the numbers for -fno-inline, I noticed it drops back down to the 4.0.0 numbers.
Comment 18 David Jaffe 2005-09-28 09:45:53 UTC
That's an improvement!

Still, here are the stats again, updated for 4.1.0:

Compile time in seconds for test.ii:

         -O0     -O1    -O2     -O3

3.4.3   5.659   9.515  13.811  14.779
4.1.0   5.829  16.398  24.618  27.066

% gcc -v
Using built-in specs.
Target: ia64-unknown-linux-gnu
Configured with: ../configure --prefix=/wga1/gcc --enable-checking=release
Thread model: posix
gcc version 4.1.0 20050926 (experimental)

Comment 19 David Jaffe 2005-10-16 12:20:09 UTC
Is further work planned on this?  Thanks.
Comment 20 Steven Bosscher 2005-10-17 09:41:03 UTC
Yes, further work is planned on this.  Someone just needs to figure out
what is still eating so much time.  If you can compile with -ftime-report
and report back the top 10 compile time consumers, that'd be helpful ;-)
Comment 21 David Jaffe 2005-10-17 10:01:11 UTC
% gcc -O3 -ftime-report -c test.ii |& grep usr | sort -t : +1 -n -r | head -10

 tree SSA incremental  :   3.19 (12%) usr   0.02 ( 2%) sys   3.18 (11%) wall    8995 kB ( 2%) ggc
 scheduling 2          :   2.03 ( 8%) usr   0.01 ( 1%) sys   2.05 ( 7%) wall   15312 kB ( 3%) ggc
 scheduling            :   1.52 ( 6%) usr   0.00 ( 0%) sys   1.53 ( 5%) wall    7838 kB ( 2%) ggc
 expand                :   1.32 ( 5%) usr   0.01 ( 1%) sys   1.34 ( 5%) wall   26717 kB ( 6%) ggc
 life analysis         :   1.30 ( 5%) usr   0.00 ( 0%) sys   1.27 ( 5%) wall    3836 kB ( 1%) ggc
 tree alias analysis   :   1.18 ( 4%) usr   0.08 ( 9%) sys   1.23 ( 4%) wall   13610 kB ( 3%) ggc
 parser                :   1.08 ( 4%) usr   0.18 (20%) sys   1.26 ( 4%) wall   99497 kB (22%) ggc
 tree PTA              :   1.00 ( 4%) usr   0.00 ( 0%) sys   1.01 ( 4%) wall    4565 kB ( 1%) ggc
 tree operand scan     :   0.82 ( 3%) usr   0.14 (16%) sys   1.03 ( 4%) wall   10663 kB ( 2%) ggc
 integration           :   0.82 ( 3%) usr   0.02 ( 2%) sys   0.84 ( 3%) wall   88155 kB (19%) ggc
Comment 22 Mark Mitchell 2005-10-31 05:46:02 UTC
Downgrading to P4.  I'd like to see more progress for 4.1, but it's not going to be release-critical.
Comment 23 David Jaffe 2005-10-31 10:47:16 UTC
When is this problem likely to be resolved?  I understand that you have to 
prioritize.  I just want to understand what the prospects are.  Thanks.
Comment 24 Steven Bosscher 2006-01-10 22:39:21 UTC
Realistically, the prospects are that this problem won't be fixed until compile time gets on the GCC developers' radar for real.  The next release always promises  to be faster, but usually turns out to be disappointingly slow in the end.

Note that in your case, there really is only one real bug.  The other slow passes are inherently slow for very large functions.  The top 3 slow passes of your latest time report was:

 tree SSA incremental  :   3.19 (12%) usr
 scheduling 2          :   2.03 ( 8%) usr
 scheduling            :   1.52 ( 6%) usr

The tree SSA incremental bit is non-linear behavior when computing dominance frontiers (inevitable) plus non-linear behavior due to massive bitmap abuse.  This incremental update tries to work on portions of the CFG, but in practice it almost always works on the whole function, so you end up doing far more work than strictly necessary.  What's required to fix this is a region based compilation model such that the SSA updater can work on dirty regions (i.e. regions where an update is needed) without worrying about the other regions.  I don't think this will be fixed any time soon :-(

As for the scheduler, well, list scheduling is just a quadratic algorithm: O(n_insns_to_schedule^2) where n_insns_to_schedule is the number of insns in the region or trace that you're scheduling.  This is one of the reasons why no compiler schedules whole functions at once: You have to split them up to keep compile times reasonable.  The fact that scheduling got slower for IA-64 has two reasons.  First, basic blocks are typically larger for IA-64 in GCC 4.x than in GCC 3.x, and typically GCC 4.x can find longer traces than GCC 3.x, so your n_insns_to_schedule is larger (I measured this on ia64-linux).  Second, the scheduler model for IA-64 is just incredibly complicated, and a significant amount of time is simply lost there because the IA-64 automaton is _huge_.

Comment 25 Mark Mitchell 2006-05-25 02:35:19 UTC
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Comment 26 Steven Bosscher 2007-12-18 20:09:25 UTC
Bring back on the radar for the release manager.
New timings would be much appreciated.  Anyone?
Comment 27 Uroš Bizjak 2007-12-19 22:02:04 UTC
(In reply to comment #26)
> Bring back on the radar for the release manager.
> New timings would be much appreciated.  Anyone?

Attached preprocessed source doesn't compile out-of-the-box with gcc-4.3.
Comment 28 Mark Mitchell 2007-12-31 07:00:57 UTC
Steven, thanks for pointing me at this.  However, I'm going to downgrade this to P4 again; I don't think there's anything obvious to be done.
Comment 29 Joseph S. Myers 2008-07-04 20:03:19 UTC
Closing 4.1 branch.
Comment 30 Joseph S. Myers 2009-03-31 18:56:05 UTC
Closing 4.2 branch.
Comment 31 Richard Biener 2009-08-04 12:26:52 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 32 Richard Biener 2010-05-22 18:10:38 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 33 Richard Biener 2011-06-27 12:12:42 UTC
4.3 branch is being closed, moving to 4.4.7 target.
Comment 34 Jakub Jelinek 2012-03-13 12:45:23 UTC
4.4 branch is being closed, moving to 4.5.4 target.
Comment 35 Steven Bosscher 2012-11-09 22:46:30 UTC
This bug is no longer relevant, the IA64 scheduler at -O3 has been replaced
completely (by the selective scheduler) and the remaining non-scheduler
slowdowns are all due to inherently non-linear algorithms, downsides that
you just get along with the benefits of SSA.  So WONTFIX.