Bug 8361 - [4.1/4.2 regression] C++ compile-time performance regression
Summary: [4.1/4.2 regression] C++ compile-time performance regression
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 3.3
: P2 normal
Target Milestone: 4.1.2
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog
Depends on: 10944 11545 13479 16131 17483 17497 17790 18507 22635
Blocks:
  Show dependency treegraph
 
Reported: 2002-10-25 13:36 UTC by Gerald Pfeifer
Modified: 2010-10-27 16:58 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work: 2.95.3, 3.0.4, 4.0.0
Known to fail: 4.1.0
Last reconfirmed: 2005-09-18 19:27:11


Attachments
generate.ii.bz2 (137.06 KB, application/octet-stream)
2003-05-21 15:17 UTC, Gerald Pfeifer
Details
generate-3.4.ii.bz2 (143.15 KB, application/octet-stream)
2003-05-21 15:17 UTC, Gerald Pfeifer
Details
Version for 3.4 and later compilers (148.07 KB, application/octet-stream)
2003-07-16 17:25 UTC, Gerald Pfeifer
Details
Version for pre 3.4 era compilers (140.62 KB, application/octet-stream)
2003-07-16 17:27 UTC, Gerald Pfeifer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gerald Pfeifer 2002-10-25 13:36:01 UTC
We have a significant performance problem with optimizing
C++ compilations.  I did some new tests using the example
from PR 3083, and here are the timings with --disable-checking (last updated 2003-04-14):

               -O0         -O1          -O2        -O3
GCC 3.0.4     27.95       44.52        56.57      56.48
3.2-branch    29.87 +7%   54.28 +22%   70.95 +25% 75.29 +33%
3.3-branch    29.09 +4%   57.11 +30%   78.99 +40% 81.61 +44%
mainline      27.06       56.09        77.77      82.02

That is, basically PR 3083 still applies, even if not as
drastically as originally.

Dan Nicolaescu and Kaveh Ghazi did some further analyses:
  http://gcc.gnu.org/ml/gcc/2002-12/msg00516.html
  http://gcc.gnu.org/ml/gcc/2003-04/msg00251.html
  http://gcc.gnu.org/ml/gcc/2003-04/msg00252.html

Release:
gcc version 3.3 20021025 (experimental)

Environment:
i386-unknown-freebsd4.6, Pentium III/1.0

How-To-Repeat:
for GCC 3.0-3.3:  \time gcc -c -O3 generate.ii
for GCC 3.4    :  \time gcc -c -O3 generate-3.4.ii

(sources last updated 2003-01-30)
Comment 1 Eric Botcazou 2003-02-18 14:58:42 UTC
State-Changed-From-To: open->analyzed
State-Changed-Why: Confirmed.
Comment 2 s.bosscher 2003-03-15 15:29:28 UTC
From: Steven Bosscher <s.bosscher@student.tudelft.nl>
To: pfeifer@dbai.tuwien.ac.at, gcc-gnats@gcc.gnu.org,
	gcc-bugs@gcc.gnu.org, nobody@gcc.gnu.org, gcc-prs@gcc.gnu.org
Cc:  
Subject: Re: optimization/8361: [3.3/3.4 regression] C++ compile-time performance
 regression
Date: Sat, 15 Mar 2003 15:29:28 +0100

 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8361
 
 Hi Gerald,
 
 Could you produce compile times for this PR and see how we're doing on 
 3.3 and
 mainline?  Quite a few speedup patches have gone in lately (most notably 
 the GC
 limits patches), so the numbers really are way outdated now...
 
 Greetz
 Steven
 
 
 P.S.  Welcome back ;-)
 
 

Comment 3 s.bosscher 2003-03-23 10:10:03 UTC
From: Steven Bosscher <s.bosscher@student.tudelft.nl>
To: pfeifer@dbai.tuwien.ac.at, gcc-gnats@gcc.gnu.org,
	gcc-bugs@gcc.gnu.org, nobody@gcc.gnu.org, gcc-prs@gcc.gnu.org
Cc:  
Subject: Re: optimization/8361: [3.3/3.4 regression] C++ compile-time performance
 regression
Date: Sun, 23 Mar 2003 10:10:03 +0100

 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=8361
 
 Janis did some more timing:
 http://gcc.gnu.org/ml/gcc/2003-03/msg01425.html
 
 

Comment 4 Gerald Pfeifer 2003-03-24 15:18:53 UTC
From: Gerald Pfeifer <pfeifer@dbai.tuwien.ac.at>
To: gcc-gnats@gcc.gnu.org, gcc-prs@gcc.gnu.org, gcc-bugs@gcc.gnu.org
Cc: Steven Bosscher <s.bosscher@student.tudelft.nl>
Subject: Re: optimization/8361: [3.3/3.4 regression] C++ compile-time
 performance regression
Date: Mon, 24 Mar 2003 15:18:53 +0100 (CET)

 I have now tested Mark's fix for PR 8361 and while that does not affect
 non-optimizing performance, the difference for -O3 is noticable.
 
 Below are the results of the 3.3-branch from 03/15 versus 03/24 (without
 explicit --disable-checking):
 
   % \time /files/pfeifer/gcc-3.3-0315/bin/g++ -c generate.ii
        28.50 real        27.57 user         0.75 sys
   % \time /files/pfeifer/gcc-3.3-0324/bin/g++ -c generate.ii
        28.70 real        27.42 user         0.69 sys
 
 No difference for non-optimizing compilation...
 
   % \time /files/pfeifer/gcc-3.3-0315/bin/g++ -O3 -c generate.ii
       109.55 real       105.85 user         3.06 sys
   % \time /files/pfeifer/gcc-3.3-0324/bin/g++ -O3 -c generate.ii
        81.28 real        77.41 user         3.55 sys
 
 ...but a nice speedup for -O3!
 
 Gerald

Comment 5 Andrew Pinski 2003-05-01 07:26:47 UTC
From: Andrew Pinski <pinskia@physics.uc.edu>
To: pfeifer@dbai.tuwien.ac.at, gcc-gnats@gcc.gnu.org, gcc-bugs@gcc.gnu.org,
   nobody@gcc.gnu.org, gcc-prs@gcc.gnu.org
Cc: Andrew Pinski <pinskia@physics.uc.edu>
Subject: Re: optimization/8361: [3.3/3.4 regression] C++ compile-time performance regression
Date: Thu, 1 May 2003 07:26:47 -0400

 Is there any way, I can get the non-preprocessed code, because the  
 preprocess code produces errors in the system headers.
 
 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit- 
 trail&database=gcc&pr=8361
 
 Thanks,
 Andrew Pinski
Comment 6 Andrew Pinski 2003-05-23 15:25:58 UTC
For powerpc-apple-darwin and 3.4, I needed to edit the generate-3.4.ii file and change size_t to 
be  unsigned long from unsigned int.
Comment 7 Andrew Pinski 2003-05-23 15:31:27 UTC
This bug can be helped by fixing bug 10944 <http://gcc.gnu.org/PR10944>.
Comment 8 Andrew Pinski 2003-06-13 19:00:27 UTC
It loooks like gcc is walking the tree too much which slows down the complation, I will look into 
when it is doing it, aka inlining or another time.
Comment 9 Andrew Pinski 2003-06-13 19:29:06 UTC
I found one of the problems with walking the trees too much, finish_function calls 
calls_setjmp_p when inlining is turned off which is not needed.  This was added with this 
patch:
1999-12-05  Mark Mitchell  <mark@codesourcery.com>

        * decl.c (init_decl_processing): Set flag_inline_trees if
        !flag_no_inline.

        * cp-tree.h (calls_setjmp_p): Declare.
        * decl.c (finish_function): Mark functions that call setjmp as
        uninlinable.
        * optimize.c (calls_setjmp_r): New function.
        (calls_setjmp_p): Likewise.

There is an easy fix for this one is not to call setjmp if flag_inline_trees is non-zero.
Comment 10 s.bosscher 2003-06-13 19:47:37 UTC
Subject: Re:  [3.3/3.4 regression] C++ compile-time
 performance regression

Andrew,

If you are right about all those tree walks, check out my fix for 1687 (3.3 branch only, 3.4 is in the works).
http://gcc.gnu.org/PR1687.  The idea is to simply use walk_tree_without_duplicates.  Our front ends tend to produce horrible convoluted trees that makr walk_tree walks really slow sometimes.

Gr.
Steven



Comment 11 Andrew Pinski 2003-06-13 19:52:13 UTC
I can do better then using walk_tree_without_duplicates if no optimizations, I do not have to look 
at all if there is no need to aka no inlining is requested (this is just for the -O0 case) which means 
3.4 might be faster then 3.0.4 which is tested. Patch in the works will test tonight.
Comment 12 s.bosscher 2003-06-19 21:58:52 UTC
Subject: Re:  [3.3/3.4 regression] C++ compile-time
 performance regression

pinskia at physics dot uc dot edu wrote:

>PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.
>
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8361
>
>
>pinskia at physics dot uc dot edu changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>  BugsThisDependsOn|                            |1687
>
No it does not.


Comment 13 Dan Nicolaescu 2003-06-20 21:17:49 UTC
As shown in:
http://gcc.gnu.org/ml/gcc/2003-06/msg01596.html
a significant number of insns generated (about 1/3 of the total)  for
generate-3.4 are NOTE_INSN_DELETED. 
I am not sure if this is a regression or not, but generating less useless insns 
should help somewhat, so this might be an interesting data point.
Comment 14 Andrew Pinski 2003-06-24 22:37:26 UTC
With the new version of the CHUD tools (beta version) which give backtraces from the 
top-down, I see that most of the time is spent in for_each_template_parm and the related 
functions for -O0 (not taking GC into account).
Comment 15 Steven Bosscher 2003-07-11 23:42:05 UTC
Yeah yeah yeah, so g++ has been slowing down since, well, forever.  But no way
this will be fixed for 3.3.1, or 3.3.2 for that matter.  For 3.4 we're doing
much better already and we still have a few months to find some speed-ups.

[ Damn I wish Apple had a reputation for delivering what they promise.  Then
  we would have a 6x faster GCC soon :-)  ]
Comment 16 Steven Bosscher 2003-07-12 20:06:37 UTC
Target milestone moved back again at the request of Gerald.
Comment 17 Gerald Pfeifer 2003-07-16 17:25:31 UTC
Created attachment 4415 [details]
Version for 3.4 and later compilers
Comment 18 Gerald Pfeifer 2003-07-16 17:27:37 UTC
Created attachment 4416 [details]
Version for pre 3.4 era compilers
Comment 19 Mark Mitchell 2003-07-23 22:50:35 UTC
Postponed, yet again -- until GCC 3.3.2 at least.

Nathan is working on a major improvement to type-comparison and
template-matching performance, but it requires the elimination of a GNU
extension.  We've now agreed to eliminate that extension (default arguments on
function types), but that means we have to deprecate it in GCC 3.4 and remove it
in GCC 3.5, unless people are willing to move up the removal to GCC 3.4.
Comment 20 Gabriel Dos Reis 2003-07-23 23:11:39 UTC
Subject: Re:  [3.3/3.4 regression] C++ compile-time performance regression

"mmitchel at gcc dot gnu dot org" <gcc-bugzilla@gcc.gnu.org> writes:

| Nathan is working on a major improvement to type-comparison and
| template-matching performance, but it requires the elimination of a GNU
| extension.  We've now agreed to eliminate that extension (default arguments on
| function types), but that means we have to deprecate it in GCC 3.4 and remove it
| in GCC 3.5, unless people are willing to move up the removal to GCC 3.4.

That deprecation was raised ages ago.  I vote for its removal in GCC-3.4.

-- Gaby
Comment 21 Andrew Pinski 2003-07-23 23:24:10 UTC
I vote for its removal in 3.4 since it fixes PR 4205, PR 4908 as nobody knew of the 
extension.
Comment 22 Gerald Pfeifer 2003-07-23 23:38:56 UTC
Given that the new parser in GCC 3.4 will "break" (note that quotes!) many/most C++ 
applications one way or the other anyway, removing such a language extension from 
G++ seem okay in the 3.4 timeframe (and even more so if it really blocks important 
improvements). 
Comment 23 Wolfgang Bangerth 2003-07-24 14:29:35 UTC
I'm not a maintainer, but if asked I'd vote for abandoning the extension as
well. I'm pretty sure more people would think of a bug in the compiler than
an intentional feature if they encountered it in real life.

And, yes, just as Gerald said: 3.4 is _the_ time to get rid of cruft in the
C++ compiler.

W.
Comment 24 Mark Mitchell 2003-10-16 02:39:40 UTC
Well, here we go postponing this PR yet again...  This time until GCC 3.4.
Comment 25 Steven Bosscher 2003-12-29 15:40:46 UTC
Zdenek's new dominator interface helps, see: 
http://gcc.gnu.org/ml/gcc-patches/2003-12/msg02164.html 
Comment 26 Andrew Pinski 2004-01-14 03:22:17 UTC
Some improvements lately made by Jan.
Comment 27 Gerald Pfeifer 2004-01-21 18:28:30 UTC
Some additional benchmark data (which will soon be outdated, and for the better
it seems) by work Jan is doing.

http://gcc.gnu.org/ml/gcc/2004-01/msg00657.html
Comment 28 Mark Mitchell 2004-03-13 16:38:45 UTC
This PR just keeps hanging around.  How sad.  But, no more work will be done
this before 3.4.0, so I've postponed until 3.4.1.
Comment 29 Mark Mitchell 2004-06-18 23:39:44 UTC
Postponed until GCC 3.4.2.
Comment 30 Andrew Pinski 2004-06-25 07:43:00 UTC
For powerpc-apple-darwin I posted two patches which helps at -O0 which goes from 18.0 seconds to 
15.3 seconds:
<http://gcc.gnu.org/ml/gcc-patches/2004-06/msg02029.html>
<http://gcc.gnu.org/ml/gcc-patches/2004-06/msg02031.html>
Note these patches solve problems specific to darwin and only helps there.
Comment 31 Mark Mitchell 2004-08-29 18:47:34 UTC
Postponed until GCC 3.4.3.
Comment 32 Paolo Bonzini 2004-09-24 15:01:10 UTC
This PR is unlikely to be closed ever, but some fresh numbers ought to be taken
for mainline.  Unfortunately I don't have even a fraction of the compilers in
the PR description here (only 3.3.4-debian and mainline), so no, I'm not
volunteering to do it. :-)

Paolo
Comment 33 Nathan Sidwell 2004-10-26 12:34:27 UTC
The updated testcase doesn't compile on i686-pc-linux-gnu, with what looks to be
target independent errors. Here are the first few,

/sw/gcc-3.0.4/include/g++-v3/bits/stl_iterator.h:452: error: type/value mismatch
at argument 1 in template parameter list for '
template<class _Category, class _Tp, class _Distance, class _Pointer, class
_Reference> struct std::iterator'
/sw/gcc-3.0.4/include/g++-v3/bits/stl_iterator.h:452: error:   expected a type,
got 'std::iterator_traits<_Iterator>::iterator_
category'
/sw/gcc-3.0.4/include/g++-v3/bits/stl_iterator.h:452: error: type/value mismatch
at argument 2 in template parameter list for '
template<class _Category, class _Tp, class _Distance, class _Pointer, class
_Reference> struct std::iterator'
/sw/gcc-3.0.4/include/g++-v3/bits/stl_iterator.h:452: error:   expected a type,
got 'std::iterator_traits<_Iterator>::value_typ
e'
/sw/gcc-3.0.4/include/g++-v3/bits/stl_iterator.h:452: error: type/value mismatch
at argument 3 in template parameter list for '
template<class _Category, class _Tp, class _Distance, class _Pointer, class
_Reference> struct std::iterator'
/sw/gcc-3.0.4/include/g++-v3/bits/stl_iterator.h:452: error:   expected a type,
got 'std::iterator_traits<_Iterator>::differenc
e_type'
/sw/gcc-3.0.4/include/g++-v3/bits/stl_iterator.h:452: error: type/value mismatch
at argument 4 in template parameter list for '
template<class _Category, class _Tp, class _Distance, class _Pointer, class
_Reference> struct std::iterator'
/sw/gcc-3.0.4/include/g++-v3/bits/stl_iterator.h:452: error:   expected a type,
got 'std::iterator_traits<_Iterator>::pointer'
/sw/gcc-3.0.4/include/g++-v3/bits/stl_iterator.h:452: error: type/value mismatch
at argument 5 in template parameter list for '
template<class _Category, class _Tp, class _Distance, class _Pointer, class
_Reference> struct std::iterator'
/sw/gcc-3.0.4/include/g++-v3/bits/stl_iterator.h:452: error:   expected a type,
got 'std::iterator_traits<_Iterator>::reference
'

what's up?
Comment 34 Gerald Pfeifer 2004-10-26 13:34:30 UTC
Is there anything left to do wrt. the testcases?  I saw that Nathan made
some (description-only?) changes.
Comment 35 Andrew Pinski 2004-10-26 13:37:24 UTC
No, Nathan just got confused on which attachment to take.
Comment 36 Mark Mitchell 2004-10-30 19:30:17 UTC
I'm not sure how interesting it is to keep this PR open.  

I'll be postponing it every time we get to a release for the forseeable future.
Comment 37 Steven Bosscher 2004-11-12 12:52:14 UTC
GCC 3.4 (CVS today) takes 35s usr on my machine.
GCC 4.0 (CVS today) takes 46s usr on the same machine.

The difference is entirely in DOM, into-SSA and SSA-other
which is really also into-SSA:

			usr	sys	wall
dominator optimization	3.16	0.02	3.26
tree SSA rewrite	3.24	0.01	3.27
tree SSA other		3.47	0.09	3.40


Per-pass and cummulative time spent (top 10 only):
integration             1.09    2.30%   48.88%
tree PHI insertion      1.21    2.56%   51.44%
loop invariant motion   1.30    2.75%   54.18%
global alloc            1.30    2.75%   56.93%
CSE                     1.72    3.63%   60.56%
parser                  3.05    6.44%   67.00%
dominator optimization  3.16    6.68%   73.68%
tree SSA rewrite        3.24    6.84%   80.52%
tree SSA other          3.47    7.33%   87.85%
expand                  5.75    12.15%  100.00%


Flat profile:
                                                                               
                
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
  1.82      8.19     8.19 13878865     0.00     0.00  is_gimple_reg
  1.50     14.95     6.76     6594     0.00     0.00  synth_mult
  1.28     20.69     5.74 12785589     0.00     0.00  ggc_alloc_stat
  1.27     26.38     5.69  3433257     0.00     0.00  free_df_for_stmt
  1.25     32.01     5.63 16868123     0.00     0.00  bitmap_set_bit
  1.19     37.35     5.34  4846931     0.00     0.00  get_stmt_operands
  1.17     42.59     5.24    62034     0.00     0.00  alloc_page
  1.15     47.75     5.16     3559     0.00     0.01  compute_immediate_uses
  0.99     52.18     4.43  6408238     0.00     0.00  htab_find_slot_with_hash
  0.98     56.60     4.42  2104725     0.00     0.00  compute_immediate_uses_for_phi
  0.93     60.76     4.16   821051     0.00     0.00  gt_ggc_mx_lang_tree_node
  0.91     64.83     4.07  7802758     0.00     0.00  register_new_def
  0.90     68.87     4.04   951728     0.00     0.00  rewrite_stmt
  0.88     72.82     3.95 30035694     0.00     0.00  bitmap_bit_p
  0.84     76.61     3.79   574332     0.00     0.00  cse_insn
  0.81     80.26     3.65   196671     0.00     0.00  compute_global_livein
  0.81     83.91     3.65   177070     0.00     0.00  insert_phi_nodes_for
  0.81     87.54     3.63  2697441     0.00     0.00  for_each_rtx
  0.81     91.16     3.62  1079773     0.00     0.00  check_phi_redundancy

which is a different way of saying "all over the map" :-(

Comment 38 Andrew Pinski 2004-11-18 18:56:28 UTC
I noticed today that my patch for PR 18507 also helps this testcase.
Comment 39 Andrew Pinski 2004-12-04 16:31:05 UTC
Here is the current results for 3.3.2 vs the mainline:
                    -O0      -O1       -O2      -O3
3.3.2          28.93      42.81   61.13    58.140
mainline     11.06      43.18   54.86    58.35

So we are faster at -O0 but slightly slower at optimization levels  but if we trust the numbers for 3.0.4 
compared to 3.3, we are still 30% slower than 3.0.4 except at -O0.
Comment 40 Andrew Pinski 2004-12-18 04:13:33 UTC
(In reply to comment #39)
> Here is the current results for 3.3.2 vs the mainline:
Now I am getting results that -O3 is faster than -O2, that is not right.
Comment 41 Steven Bosscher 2004-12-23 11:16:17 UTC
Gerald, you think you can find some cycles to see where we
stand?  I'm very curious how we do for this file, and for
the rest of your test suite.

(It'd be nice if you can compare mainline with some other
official FSF build (3.3, 3.4), because our system compilers
are profiledbootstraped so that gives a skewed picture...)
Comment 42 Andrew Pinski 2005-01-10 01:35:34 UTC
I am now getting results which say at -O1, we are now faster than 3.3.2, could someone test to make 
sure that they get close results to mine?
Comment 43 Andrew Pinski 2005-01-16 04:50:06 UTC
(In reply to comment #39)
> Here is the current results for 3.3.2 vs the mainline:
>                     -O0      -O1       -O2      -O3
> 3.3.2          28.93      42.81   61.13    58.140
> mainline     11.06      43.18   54.86    58.35

And more current results for the mainline on powerpc-darwin:
                      11.09      30.55    39.09   38.74

So it looks like this is fixed really and we are 40 % faster than 3.3.2 at -O1 on this testcase.
56% faster at -O2 and 50% at -O3. (which means we have caught back up to and past 3.0.4's number if 
the numbers in comment #0 scales the same on powerpc).

Someone should really do timings on x86 to make sure that they give about the same as powerpc.
Comment 44 Serge Belyshev 2005-01-16 14:16:42 UTC
here is the timings for i686-pc-linux-gnu:

        3.0.5   3.2.3   3.3.6   3.4.4   4.0.0   4.0.0/3.0.5

-O0      24.5    26.0    22.4    20.5    16.9    -31%
-O1      41.8    48.3    42.8    37.3    44.8     +7%
-O2      53.4    64.9    59.0    61.6    55.9     +5%
-O3      54.5    68.8    62.8    64.8    57.2     +5%

compilers are:

3.0.5 20030502 (prerelease)
3.2.3
3.3.6 20050116 (prerelease)
3.4.4 20050116 (prerelease)
4.0.0 20050116 (experimental)

all compilers compiled by GNU C version 3.3.6 20050116 (prerelease).
Comment 45 Steven Bosscher 2005-01-16 14:21:43 UTC
Please don't close this bug, ever!  It's GCC nostalgia.  ;-) 
Comment 46 Andrew Pinski 2005-01-28 14:48:12 UTC
Can someone do the timings again on x86, I think we are faster at -O1 now than previous versions and 
faster for all other optimization levels?
On ppc-darwin we speed up about 3% (-O2/-O3) to 16% (-O1) between the 15th and now.
Comment 47 Steven Bosscher 2005-01-28 15:15:11 UTC
I will do timings with a bunch of gcc3.x compilers and gcc4.0.
Comment 48 Steven Bosscher 2005-02-06 16:04:27 UTC
All compilers were bootstrapped, with the following flags: 
 
"--disable-{nls,checking} --enable-languages=c,c++" 
 
Below, gcc40 is CVS HEAD.  This was on a 1.6GHz Opteron, with -m32. 
The machine has 4GB of memory so garbage collection times are zero, 
which may account for some of the rather unexpected results. 
For gcc34 and gcc40 I used generate-3.4.ii.bz2 (attachment 3 [details]) and 
for the other two I used the latest generate.ii.bz2 (attachment 4 [details]). 
 
        gcc32   gcc33   gcc34   gcc40 
-O0     16.439s 16.172s 15.223s 6.674s 
-O1     30.265s 25.115s 20.678s 20.305s 
-O2     42.678s 34.908s 34.526s 27.418s 
-O3     47.469s 47.538s 35.706s 27.896s 
 
I'll try to get numbers on a 32bits machine (i686) as well. 
 
Comment 49 Steven Bosscher 2005-02-06 16:49:49 UTC
Similar numbers on a 1.4GHz Xeon (i686): 
 
        gcc32   gcc33   gcc34   gcc40 
-O0     18.865s 15.107s 13.286s 10.193s 
-O1     33.511s 30.096s 24.693s 23.543s 
-O2     46.527s 42.657s 42.618s 33.549s 
-O3     49.537s 43.887s 44.056s 33.917s 
 
Comment 50 Steven Bosscher 2005-02-06 16:54:17 UTC
Considering the numbers from #44, #48, and #49, I think we can conclude 
that we are back to the compile times GCC 3.0 used to have.  It should 
be noted that we have a significantly larger memory foot print though, 
and some of the speedups (especially from GCC 3.2 to GCC 3.3) came from 
smaller hacks to the GC system (collect less often, etc.).  But then, 
most people just use the compiler with -O[0123] and no fancy --params 
and similar hacks, so from a user POV this bug really is fixed, mostly. 
I'm not sure if it is useful to keep this bug open any longer. 
Comment 51 Kaveh Ghazi 2005-02-06 18:08:51 UTC
If you want to compare how the memory footprint has affected performance, use 
these flags in 3.3 and later:

--param ggc-min-expand=30 --param ggc-min-heapsize=4096

Those are the hardcoded values that 3.2 uses to tune how often the collector 
runs.  I would be interested to see how later versions behave when supplied 
these flags, this will simulate how fast we compile on memory constrained boxes 
relative to 3.2.

Another perhaps more interesting test (but one which will take slightly more 
effort for you) would be to see how raising these values in 3.2 will affect 
performance.  Some distros (RH?) did in fact raise them in their releases so 
users may be comparing their cranked distro gcc-3.2 to our FSF releases.

Of course since these values are hardcoded in 3.2, you'd have to rebuild that 
compiler, however I think an apples-to-apples comparsion is in order before 
closing this PR.
Comment 52 Gerald Pfeifer 2005-02-07 23:09:09 UTC
I had done extensive benchmarks around New Year, based on Steven's request in
comment #41.  Unfortunately I lost most of that data directly before posting
it here and couldn't repeat everything, but coincidently I could save exactly
those parts that Steven did not check now. ;-)  CVS refers to the state in
early January.

The following are for the full application which generate.ii is only one part
of, albeit a representative one.

First the time to build with -O3 and the resulting binary size:

  --------+ stripped-+ build time
  2.95    |  4577588 | 170.78 real
  3.2.3   |  4106176 | 219.70 real
  3.3 CVS |  1073280 | 209.02 real
  3.4 CVS |  1079120 | 189.82 real
  4.0 CVS |  1081776 | 164.86 real

Then some benchmarks results for the binaries; times in seconds, smaller is
better:
      
                        |   2.95 |  3.2.3 | 3.3 CVS | 3.4 CVS | 4.0 CVS |
          --------------+--------+--------+---------+---------+---------+
          STRATCOMP2-ALL|  17.96 | 127.44 |   89.51 |   21.02 |   20.47 |
         STRATCOMP-BRAVE|  77.09 |  78.33 |   77.70 |   83.33 |   82.83 |
                   2QBF1|  11.68 |  13.72 |   13.45 |   13.75 |   12.31 |
              PRIMEIMPL2|   7.52 |   8.05 |    7.21 |    7.00 |    7.42 |
                ANCESTOR|  70.44 |  69.91 |   71.22 |   67.36 |   61.36 |
           3COL-SIMPLEX1|   3.67 |   3.81 |    3.86 |    3.77 |    3.52 |
             3COL-LADDER|  77.99 |  81.11 |   81.72 |   73.23 |   71.58 |
           3COL-N-LADDER|   1.68 |   2.82 |    2.76 |    1.81 |    1.81 |
            3COL-RANDOM1|   8.38 |   8.33 |    7.84 |    8.13 |    8.61 |
              HP-RANDOM1|   6.52 |   7.29 |    7.19 |    7.90 |    7.65 |
           HAMCYCLE-FREE|  68.46 |  88.72 |   82.77 |   64.63 |   66.40 |
                 DECOMP2|   7.75 |   8.48 |    8.98 |    9.87 |    8.80 |
            BW-P5-Esra-a|  34.76 |  36.23 |   35.20 |   31.39 |   31.41 |
            BW-P8-nopush|  90.17 |  89.79 |   88.17 |   81.97 |   83.51 |
           BW-P6-pushbin|  60.23 |  62.86 |   61.34 |   59.09 |   59.94 |
         BW-P7-nopushbin|  84.94 |  87.46 |   83.80 |   79.93 |   81.23 |
                  3SAT-1|  23.91 |  24.91 |   22.55 |   22.23 |   23.19 |
       3SAT-1-CONSTRAINT|  13.97 |  14.76 |   13.51 |   13.37 |   14.15 |
            HANOI-Towers| 737.91 | 632.95 |  636.27 |  680.56 |  661.77 |
         RAMSEY(3,7)!=21|  68.93 |  73.92 |   71.77 |   74.71 |   73.59 |
 RAMSEY(3,7)!=21, normal|  83.92 |  84.02 |   83.32 |   81.23 |   79.21 |
         RAMSEY(4,6)!=25|  92.53 |  99.69 |   95.06 |   96.33 |   90.40 |
         RAMSEY(4,6)!=26| 130.68 | 142.55 |  134.61 |  134.75 |  124.73 |
                 CRISTAL|   5.75 |   5.98 |    5.67 |    5.56 |    5.29 |
                 HANOI-K|1176.06 |1289.65 | 1252.41 | 1154.43 | 1082.85 |
               21-QUEENS|   7.09 |   7.12 |    6.30 |    6.30 |    6.31 |
       MSTDir[V=13,A=40]|  14.34 |  13.02 |   12.34 |   11.50 |   11.69 |
       MSTDir[V=15,A=40]|  14.20 |  12.98 |   12.43 |   11.47 |   11.65 |
     MSTUndir[V=13,A=40]|   7.18 |   7.07 |    6.53 |    6.14 |    6.34 |
     MSTUndir[V=15,A=40]| 116.86 | 113.12 |  104.71 |   99.37 |  103.56 |
          TIMETABLING_4C| 137.64 | 140.79 |  138.66 |  173.87 |  165.50 |
      SCHOOL_TIMETABLING| 328.57 |    -   |     -   |  329.02 |  310.30 | 

So, in terms of build time and binary size we are fine, and also benchmark
performance is nicely improved on average (with some regressions, though).

For whether we can close this now, I'll just refer to comment #32 and
comment #45 (and Kaveh's note on memory usage).
Comment 53 Andrew Pinski 2005-07-23 21:58:54 UTC
We have regressioned since the last time someone reported on this one:
-O0   -O1   -O2   -O3
11.1  41.7   55.6  65.9

For -O3, the following passes stand out for compile time:
 tree PTA              :   4.04 ( 6%) usr   0.11 ( 1%) sys   4.45 ( 5%) wall    9319 kB ( 2%) ggc
 tree alias analysis   :   5.34 ( 7%) usr   1.42 ( 9%) sys   7.07 ( 8%) wall   11463 kB ( 2%) ggc
 parser                :   4.48 ( 6%) usr   2.16 (14%) sys   7.11 ( 8%) wall   95214 kB (18%) ggc
 tree operand scan     :   4.28 ( 6%) usr   2.86 (19%) sys   7.41 ( 8%) wall   22145 kB ( 4%) ggc
 dominator optimization:   3.60 ( 5%) usr   0.21 ( 1%) sys   4.02 ( 4%) wall   16448 kB ( 3%) ggc
 expand                :   3.13 ( 4%) usr   0.27 ( 2%) sys   3.53 ( 4%) wall   34210 kB ( 6%) ggc

For memory usage:
 integration           :   2.70 ( 4%) usr   0.30 ( 2%) sys   3.24 ( 4%) wall  124856 kB (24%) ggc
 parser                :   4.48 ( 6%) usr   2.16 (14%) sys   7.11 ( 8%) wall   95214 kB (18%) ggc


At -O0 compile time:
 parser                :   4.55 (33%) usr   2.00 (29%) sys   6.75 (31%) wall   94454 kB (50%) ggc
 name lookup           :   1.82 (13%) usr   2.98 (43%) sys   5.02 (23%) wall   17923 kB ( 9%) ggc
 expand                :   1.57 (11%) usr   0.40 ( 6%) sys   2.04 ( 9%) wall   33674 kB (18%) ggc
 global alloc          :   1.22 ( 9%) usr   0.06 ( 1%) sys   1.36 ( 6%) wall    8858 kB ( 5%) ggc

for memory usage, just the parser.

at -O1:
 parser                :   4.23 ( 9%) usr   2.23 (17%) sys   6.94 (11%) wall   94371 kB (22%) ggc
 integration           :   2.46 ( 5%) usr   0.29 ( 2%) sys   2.70 ( 4%) wall  104683 kB (25%) ggc
 tree PTA              :   3.48 ( 7%) usr   0.09 ( 1%) sys   3.76 ( 6%) wall    8378 kB ( 2%) ggc
 tree alias analysis   :   3.22 ( 7%) usr   1.23 ( 9%) sys   4.69 ( 7%) wall    6203 kB ( 1%) ggc
 tree SSA incremental  :   2.52 ( 5%) usr   0.30 ( 2%) sys   3.06 ( 5%) wall    3278 kB ( 1%) ggc
 tree operand scan     :   3.56 ( 7%) usr   2.32 (18%) sys   6.40 (10%) wall   18232 kB ( 4%) ggc

memory usage:
 integration           :   2.46 ( 5%) usr   0.29 ( 2%) sys   2.70 ( 4%) wall  104683 kB (25%) ggc
 parser                :   4.23 ( 9%) usr   2.23 (17%) sys   6.94 (11%) wall   94371 kB (22%) ggc


-O2:
 expand                :   2.90 ( 5%) usr   0.24 ( 2%) sys   3.02 ( 4%) wall   31476 kB ( 7%) ggc
 tree SSA incremental  :   2.67 ( 4%) usr   0.38 ( 3%) sys   3.30 ( 4%) wall    6252 kB ( 1%) ggc
 tree operand scan     :   3.76 ( 6%) usr   2.49 (18%) sys   6.05 ( 8%) wall   19509 kB ( 4%) ggc
 dominator optimization:   2.91 ( 5%) usr   0.13 ( 1%) sys   3.14 ( 4%) wall   14117 kB ( 3%) ggc
 tree PTA              :   3.46 ( 6%) usr   0.15 ( 1%) sys   3.79 ( 5%) wall    8394 kB ( 2%) ggc
 tree alias analysis   :   3.97 ( 6%) usr   1.40 (10%) sys   5.65 ( 7%) wall   10165 kB ( 2%) ggc
 parser                :   4.41 ( 7%) usr   2.34 (17%) sys   7.21 ( 9%) wall   94371 kB (20%) ggc
 integration           :   2.48 ( 4%) usr   0.23 ( 2%) sys   2.70 ( 3%) wall  104710 kB (22%) ggc

memory usage:
 parser                :   4.41 ( 7%) usr   2.34 (17%) sys   7.21 ( 9%) wall   94371 kB (20%) ggc
 integration           :   2.48 ( 4%) usr   0.23 ( 2%) sys   2.70 ( 3%) wall  104710 kB (22%) ggc
Comment 54 Andrew Pinski 2005-09-18 19:17:48 UTC
Current numbers for 4.0.0 vs 4.1.0:
pc64:~/src/pr8361> time ~/onetest.release/bin/gcc pr8361.ii -S -m32 -O1
21.137u 0.399s 0:21.89 98.3%    0+0k 0+0io 3pf+0w
pc64:~/src/pr8361> time gcc-4.0 pr8361.ii -S -m32 -O1
14.059u 0.269s 0:14.46 98.9%    0+0k 0+0io 2pf+0w

This on x86_64-pc-linux-gnu.

-ftime-report for 4.1.0:

Execution times (seconds)
 garbage collection    :   0.35 ( 2%) usr   0.01 ( 1%) sys   0.37 ( 2%) wall       0 kB ( 0%) ggc
 callgraph construction:   0.13 ( 1%) usr   0.01 ( 1%) sys   0.18 ( 1%) wall    4538 kB ( 1%) ggc
 callgraph optimization:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall    1193 kB ( 0%) ggc
 ipa reference         :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall     273 kB ( 0%) ggc
 ipa pure const        :   0.04 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 cfg construction      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall    1607 kB ( 0%) ggc
 cfg cleanup           :   0.14 ( 1%) usr   0.01 ( 1%) sys   0.13 ( 1%) wall     103 kB ( 0%) ggc
 trivially dead code   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall       0 kB ( 0%) ggc
 life analysis         :   0.52 ( 2%) usr   0.00 ( 0%) sys   0.52 ( 2%) wall    3245 kB ( 0%) ggc
 life info update      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     292 kB ( 0%) ggc
 alias analysis        :   0.12 ( 1%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall    2150 kB ( 0%) ggc
 register scan         :   0.16 ( 1%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall     211 kB ( 0%) ggc
 rebuild jump labels   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       1 kB ( 0%) ggc
 preprocessing         :   0.13 ( 1%) usr   0.15 ( 8%) sys   0.28 ( 1%) wall     591 kB ( 0%) ggc
 parser                :   1.80 ( 8%) usr   0.42 (23%) sys   2.35 (10%) wall  154459 kB (23%) ggc
 name lookup           :   0.57 ( 3%) usr   0.46 (25%) sys   0.97 ( 4%) wall   31048 kB ( 5%) ggc
 inline heuristics     :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall    7605 kB ( 1%) ggc
 integration           :   1.14 ( 5%) usr   0.01 ( 1%) sys   1.14 ( 5%) wall  162853 kB (24%) ggc
 tree gimplify         :   0.30 ( 1%) usr   0.02 ( 1%) sys   0.28 ( 1%) wall   14133 kB ( 2%) ggc
 tree eh               :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1795 kB ( 0%) ggc
 tree CFG construction :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall   11718 kB ( 2%) ggc
 tree CFG cleanup      :   0.49 ( 2%) usr   0.00 ( 0%) sys   0.68 ( 3%) wall    3669 kB ( 1%) ggc
 tree copy propagation :   0.60 ( 3%) usr   0.00 ( 0%) sys   0.63 ( 3%) wall    1441 kB ( 0%) ggc
 tree store copy prop  :   0.12 ( 1%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall     181 kB ( 0%) ggc
 tree find ref. vars   :   0.25 ( 1%) usr   0.01 ( 1%) sys   0.29 ( 1%) wall   22675 kB ( 3%) ggc
 tree PTA              :   1.61 ( 7%) usr   0.02 ( 1%) sys   1.72 ( 7%) wall   10266 kB ( 2%) ggc
 tree alias analysis   :   1.05 ( 5%) usr   0.15 ( 8%) sys   1.23 ( 5%) wall   11045 kB ( 2%) ggc
 tree PHI insertion    :   0.29 ( 1%) usr   0.00 ( 0%) sys   0.29 ( 1%) wall   16546 kB ( 2%) ggc
 tree SSA rewrite      :   0.65 ( 3%) usr   0.01 ( 1%) sys   0.76 ( 3%) wall   30896 kB ( 5%) ggc
 tree SSA other        :   0.15 ( 1%) usr   0.06 ( 3%) sys   0.20 ( 1%) wall     580 kB ( 0%) ggc
 tree SSA incremental  :   1.58 ( 7%) usr   0.01 ( 1%) sys   1.34 ( 6%) wall    6475 kB ( 1%) ggc
 tree operand scan     :   1.15 ( 5%) usr   0.25 (14%) sys   1.47 ( 6%) wall   15753 kB ( 2%) ggc
 dominator optimization:   0.80 ( 4%) usr   0.04 ( 2%) sys   0.84 ( 4%) wall   14884 kB ( 2%) ggc
 tree SRA              :   0.22 ( 1%) usr   0.02 ( 1%) sys   0.20 ( 1%) wall   11416 kB ( 2%) ggc
 tree STORE-CCP        :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     165 kB ( 0%) ggc
 tree CCP              :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall     601 kB ( 0%) ggc
 tree split crit edges :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall    6441 kB ( 1%) ggc
 tree reassociation    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       1 kB ( 0%) ggc
 tree FRE              :   0.51 ( 2%) usr   0.02 ( 1%) sys   0.53 ( 2%) wall   16049 kB ( 2%) ggc
 tree code sinking     :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall      54 kB ( 0%) ggc
 tree linearize phis   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      16 kB ( 0%) ggc
 tree forward propagate:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 1%) wall    3515 kB ( 1%) ggc
 tree conservative DCE :   0.39 ( 2%) usr   0.00 ( 0%) sys   0.49 ( 2%) wall       0 kB ( 0%) ggc
 tree aggressive DCE   :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE              :   0.11 ( 1%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall      85 kB ( 0%) ggc
 PHI merge             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     966 kB ( 0%) ggc
 tree loop bounds      :   0.14 ( 1%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall    1796 kB ( 0%) ggc
 loop invariant motion :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall     158 kB ( 0%) ggc
 tree canonical iv     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     955 kB ( 0%) ggc
 scev constant prop    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     721 kB ( 0%) ggc
 complete unrolling    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    1340 kB ( 0%) ggc
 tree iv optimization  :   0.15 ( 1%) usr   0.01 ( 1%) sys   0.18 ( 1%) wall    7715 kB ( 1%) ggc
 tree loop init        :   0.11 ( 1%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       6 kB ( 0%) ggc
 tree copy headers     :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall    6478 kB ( 1%) ggc
 tree SSA uncprop      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree SSA to normal    :   0.31 ( 1%) usr   0.00 ( 0%) sys   0.41 ( 2%) wall    3411 kB ( 1%) ggc
 tree rename SSA copies:   0.10 ( 0%) usr   0.01 ( 1%) sys   0.16 ( 1%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   1.28 ( 6%) usr   0.02 ( 1%) sys   1.12 ( 5%) wall   43499 kB ( 7%) ggc
 varconst              :   0.08 ( 0%) usr   0.02 ( 1%) sys   0.05 ( 0%) wall     403 kB ( 0%) ggc
 jump                  :   0.05 ( 0%) usr   0.01 ( 1%) sys   0.06 ( 0%) wall    1203 kB ( 0%) ggc
 CSE                   :   0.22 ( 1%) usr   0.00 ( 0%) sys   0.20 ( 1%) wall     647 kB ( 0%) ggc
 loop analysis         :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 1%) wall    1936 kB ( 0%) ggc
 branch prediction     :   0.20 ( 1%) usr   0.01 ( 1%) sys   0.18 ( 1%) wall    1979 kB ( 0%) ggc
 flow analysis         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       2 kB ( 0%) ggc
 combiner              :   0.46 ( 2%) usr   0.00 ( 0%) sys   0.50 ( 2%) wall    5390 kB ( 1%) ggc
 if-conversion         :   0.08 ( 0%) usr   0.01 ( 1%) sys   0.06 ( 0%) wall     308 kB ( 0%) ggc
 local alloc           :   0.25 ( 1%) usr   0.01 ( 1%) sys   0.26 ( 1%) wall    1622 kB ( 0%) ggc
 global alloc          :   0.85 ( 4%) usr   0.01 ( 1%) sys   0.78 ( 3%) wall    9331 kB ( 1%) ggc
 reload CSE regs       :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall    2917 kB ( 0%) ggc
 flow 2                :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall    1058 kB ( 0%) ggc
 if-conversion 2       :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall      37 kB ( 0%) ggc
 rename registers      :   0.14 ( 1%) usr   0.00 ( 0%) sys   0.15 ( 1%) wall      21 kB ( 0%) ggc
 machine dep reorg     :   0.12 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall      86 kB ( 0%) ggc
 shorten branches      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 final                 :   0.24 ( 1%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall    1199 kB ( 0%) ggc
 symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     180 kB ( 0%) ggc
 TOTAL                 :  21.70             1.81            23.79             667733 kB

for 4.0.0:

Execution times (seconds)
 garbage collection    :   0.32 ( 2%) usr   0.00 ( 0%) sys   0.32 ( 2%) wall
 callgraph construction:   0.08 ( 1%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 callgraph optimization:   0.04 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall
 cfg construction      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 cfg cleanup           :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 trivially dead code   :   0.08 ( 1%) usr   0.00 ( 0%) sys   0.09 ( 1%) wall
 life analysis         :   0.43 ( 3%) usr   0.00 ( 0%) sys   0.44 ( 3%) wall
 life info update      :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 1%) wall
 alias analysis        :   0.08 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 1%) wall
 register scan         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 1%) wall
 rebuild jump labels   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 preprocessing         :   0.11 ( 1%) usr   0.15 ( 9%) sys   0.22 ( 1%) wall
 parser                :   2.00 (14%) usr   0.46 (29%) sys   2.20 (13%) wall
 name lookup           :   0.49 ( 3%) usr   0.44 (28%) sys   1.12 ( 7%) wall
 integration           :   0.67 ( 5%) usr   0.03 ( 2%) sys   0.77 ( 5%) wall
 tree gimplify         :   0.23 ( 2%) usr   0.01 ( 1%) sys   0.35 ( 2%) wall
 tree eh               :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 tree CFG construction :   0.12 ( 1%) usr   0.00 ( 0%) sys   0.11 ( 1%) wall
 tree CFG cleanup      :   0.22 ( 2%) usr   0.00 ( 0%) sys   0.15 ( 1%) wall
 tree find referenced vars:   0.23 ( 2%) usr   0.00 ( 0%) sys   0.20 ( 1%) wall
 tree PTA              :   0.37 ( 3%) usr   0.01 ( 1%) sys   0.45 ( 3%) wall
 tree alias analysis   :   0.51 ( 3%) usr   0.00 ( 0%) sys   0.56 ( 3%) wall
 tree PHI insertion    :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.24 ( 1%) wall
 tree SSA rewrite      :   0.54 ( 4%) usr   0.01 ( 1%) sys   0.47 ( 3%) wall
 tree SSA other        :   0.58 ( 4%) usr   0.16 (10%) sys   0.85 ( 5%) wall
 tree operand scan     :   0.51 ( 3%) usr   0.21 (13%) sys   0.68 ( 4%) wall
 dominator optimization:   0.76 ( 5%) usr   0.05 ( 3%) sys   0.66 ( 4%) wall
 tree SRA              :   0.12 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 1%) wall
 tree CCP              :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 1%) wall
 tree split crit edges :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 tree remove redundant PHIs:   0.26 ( 2%) usr   0.00 ( 0%) sys   0.33 ( 2%) wall
 tree linearize phis   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 tree forward propagate:   0.15 ( 1%) usr   0.00 ( 0%) sys   0.12 ( 1%) wall
 tree conservative DCE :   0.26 ( 2%) usr   0.00 ( 0%) sys   0.26 ( 2%) wall
 tree aggressive DCE   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 tree DSE              :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 PHI merge             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 tree record loop bounds:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 loop invariant motion :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 tree canonical iv creation:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 tree iv optimization  :   0.22 ( 2%) usr   0.01 ( 1%) sys   0.13 ( 1%) wall
 tree loop init        :   0.12 ( 1%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 tree loop fini        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 tree copy headers     :   0.12 ( 1%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall
 tree SSA to normal    :   0.33 ( 2%) usr   0.00 ( 0%) sys   0.30 ( 2%) wall
 tree NRV optimization :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 tree rename SSA copies:   0.11 ( 1%) usr   0.00 ( 0%) sys   0.10 ( 1%) wall
 dominance frontiers   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 expand                :   0.97 ( 7%) usr   0.01 ( 1%) sys   1.06 ( 6%) wall
 varconst              :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 1%) wall
 jump                  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 CSE                   :   0.27 ( 2%) usr   0.00 ( 0%) sys   0.22 ( 1%) wall
 loop analysis         :   0.08 ( 1%) usr   0.00 ( 0%) sys   0.09 ( 1%) wall
 branch prediction     :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall
 flow analysis         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 combiner              :   0.39 ( 3%) usr   0.00 ( 0%) sys   0.35 ( 2%) wall
 if-conversion         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 local alloc           :   0.22 ( 2%) usr   0.00 ( 0%) sys   0.20 ( 1%) wall
 global alloc          :   0.64 ( 4%) usr   0.01 ( 1%) sys   0.66 ( 4%) wall
 reload CSE regs       :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 1%) wall
 flow 2                :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 if-conversion 2       :   0.07 ( 0%) usr   0.01 ( 1%) sys   0.02 ( 0%) wall
 rename registers      :   0.12 ( 1%) usr   0.00 ( 0%) sys   0.12 ( 1%) wall
 machine dep reorg     :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 1%) wall
 shorten branches      :   0.10 ( 1%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 final                 :   0.18 ( 1%) usr   0.02 ( 1%) sys   0.24 ( 1%) wall
 symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 rest of compilation   :   0.12 ( 1%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 TOTAL                 :  14.62             1.60            16.43
14.630u 1.630s 0:16.46 98.7%    0+0k 0+0io 0pf+0w
Comment 55 Andrew Pinski 2005-09-18 19:27:11 UTC
Even the -fno-inline case slowed down too:
pc64:~/src/pr8361> time ~/onetest.release/bin/gcc pr8361.ii -S -m32 -O1 -fno-inline
11.171u 0.359s 0:11.66 98.7%    0+0k 0+0io 0pf+0w
pc64:~/src/pr8361> time gcc-4.0 pr8361.ii -S -m32 -O1 -fno-inline
9.578u 0.295s 0:10.02 98.4%     0+0k 0+0io 0pf+0w


interesting part of 4.1 time report:
 combiner              :   0.36 ( 3%) usr   0.00 ( 0%) sys   0.31 ( 2%) wall    1933 kB ( 0%) ggc
 local alloc           :   0.33 ( 3%) usr   0.00 ( 0%) sys   0.31 ( 2%) wall    3654 kB ( 1%) ggc
 global alloc          :   0.85 ( 8%) usr   0.00 ( 0%) sys   0.75 ( 6%) wall   12187 kB ( 3%) ggc
 tree operand scan     :   0.25 ( 2%) usr   0.10 ( 7%) sys   0.29 ( 2%) wall    9315 kB ( 2%) ggc
 dominator optimization:   0.35 ( 3%) usr   0.01 ( 1%) sys   0.34 ( 3%) wall    3938 kB ( 1%) ggc
 tree PTA              :   0.46 ( 4%) usr   0.02 ( 1%) sys   0.53 ( 4%) wall   19358 kB ( 5%) ggc
 tree alias analysis   :   0.25 ( 2%) usr   0.04 ( 3%) sys   0.27 ( 2%) wall    9734 kB ( 2%) ggc
 tree SSA rewrite      :   0.19 ( 2%) usr   0.00 ( 0%) sys   0.24 ( 2%) wall   10314 kB ( 3%) ggc
 tree SSA incremental  :   0.32 ( 3%) usr   0.00 ( 0%) sys   0.26 ( 2%) wall    2956 kB ( 1%) ggc
 parser                :   1.71 (15%) usr   0.42 (28%) sys   2.07 (16%) wall  154459 kB (39%) ggc
 life analysis         :   0.57 ( 5%) usr   0.01 ( 1%) sys   0.51 ( 4%) wall    3127 kB ( 1%) ggc


corresponding 4.0 time report:
 combiner              :   0.31 ( 3%) usr   0.01 ( 1%) sys   0.28 ( 2%) wall
 local alloc           :   0.35 ( 4%) usr   0.00 ( 0%) sys   0.47 ( 4%) wall
 global alloc          :   0.74 ( 7%) usr   0.01 ( 1%) sys   0.96 ( 8%) wall
 tree operand scan     :   0.18 ( 2%) usr   0.07 ( 5%) sys   0.23 ( 2%) wall
 dominator optimization:   0.42 ( 4%) usr   0.01 ( 1%) sys   0.31 ( 3%) wall
 tree PTA              :   0.14 ( 1%) usr   0.01 ( 1%) sys   0.11 ( 1%) wall
 tree alias analysis   :   0.05 ( 1%) usr   0.00 ( 0%) sys   0.07 ( 1%) wall
 tree SSA rewrite      :   0.17 ( 2%) usr   0.00 ( 0%) sys   0.18 ( 2%) wall
 parser                :   1.71 (17%) usr   0.41 (32%) sys   2.22 (20%) wall
 life analysis         :   0.43 ( 4%) usr   0.00 ( 0%) sys   0.61 ( 5%) wall


Comment 56 Ian Lance Taylor 2005-10-12 05:26:19 UTC
Is this PR really a 4.0 regression?  The timings which I see in the comments suggest that 4.0 is just as fast as earlier releases.

That is, the PR may have become a 4.1 regression, but I don't see that it is a 4.0 regression.
Comment 57 Andrew Pinski 2005-10-13 03:34:48 UTC
A semi recent 4.1 (the 10th) gives:
 tree PTA              :   1.60 ( 6%) usr   0.02 ( 1%) sys   1.73 ( 6%) wall   10338 kB ( 1%) ggc
 tree alias analysis   :   1.32 ( 5%) usr   0.19 (10%) sys   1.48 ( 5%) wall   18910 kB ( 3%) ggc

while 4.0 gave:
 tree PTA              :   0.50 ( 2%) usr   0.00 ( 0%) sys   0.48 ( 2%) wall
 tree alias analysis   :   0.73 ( 3%) usr   0.00 ( 0%) sys   0.76 ( 3%) wall

So this is definitely a 4.1 regression.
Comment 58 Daniel Berlin 2005-10-13 04:07:33 UTC
Subject: Re:  [3.4/4.0/4.1 regression] C++
	compile-time performance regression

On Thu, 2005-10-13 at 03:34 +0000, pinskia at gcc dot gnu dot org wrote:
> 
> ------- Comment #57 from pinskia at gcc dot gnu dot org  2005-10-13 03:34 -------
> A semi recent 4.1 (the 10th) gives:
>  tree PTA              :   1.60 ( 6%) usr   0.02 ( 1%) sys   1.73 ( 6%) wall  
> 10338 kB ( 1%) ggc
>  tree alias analysis   :   1.32 ( 5%) usr   0.19 (10%) sys   1.48 ( 5%) wall  
> 18910 kB ( 3%) ggc
> 
> while 4.0 gave:
>  tree PTA              :   0.50 ( 2%) usr   0.00 ( 0%) sys   0.48 ( 2%) wall
>  tree alias analysis   :   0.73 ( 3%) usr   0.00 ( 0%) sys   0.76 ( 3%) wall
> 
> So this is definitely a 4.1 regression.
> 
> 

I'm pretty sure we run PTA more times in 4.1 than 4.0
Maybe i'm wrong.
Can you oprofile this and give me some kind of hotspot to look into in
PTA?


Comment 59 Ian Lance Taylor 2005-10-13 04:13:45 UTC
I'm going to mark this as just a 4.1 regression.  As far as I can see, 4.0 was OK.  And there is zero chance that we are going to address any of these issues in 3.4, except perhaps coincidentally.
Comment 60 Mark Mitchell 2005-10-30 21:45:05 UTC
I'd like to fix this for 4.1, but not at the expense of destabilizing things, or losing performance.
Comment 61 Andrew Pinski 2005-10-30 23:38:05 UTC
(In reply to comment #60)
> I'd like to fix this for 4.1, but not at the expense of destabilizing things,
> or losing performance.

Does this controdict what you wrote in PR 15678:
> However, in the meanwhile, I've downgraded this to P4.  A small compile-time
> increase isn't going to block the upcoming releases.

This is a small increase really about 2-3 seconds.
Comment 62 Steven Bosscher 2006-02-11 15:46:49 UTC
Compile times for generate-3.4.ii
All compilers bootstrapped, with checking disabled.

Flags: -O2

GCC 4.0 (release branch today):
real    0m22.795s       0m22.727s       0m22.760s
user    0m22.481s       0m22.297s       0m22.357s
sys     0m0.316s        0m0.412s        0m0.404s

GCC 4.1 (release branch today):
real    0m29.888s       0m28.450s       0m28.420s
user    0m28.154s       0m27.906s       0m27.894s
sys     0m0.496s        0m0.544s        0m0.524s

GCC 4.2 (trunk today):
real    0m33.715s       0m31.524s       0m31.483s
user    0m31.466s       0m31.034s       0m31.022s
sys     0m0.424s        0m0.492s        0m0.460s



Flags: -O3

GCC 4.0 (release branch today):
real    0m24.412s       0m25.000s       0m24.771s
user    0m23.921s       0m24.430s       0m24.210s
sys     0m0.368s        0m0.408s        0m0.420s

GCC 4.1 (release branch today):
real    0m33.260s       0m33.140s       0m33.188s
user    0m32.602s       0m32.522s       0m32.554s
sys     0m0.556s        0m0.544s        0m0.600s

GCC 4.2 (trunk today):
real    0m36.544s       0m36.614s       0m36.492s
user    0m35.950s       0m35.942s       0m35.994s
sys     0m0.544s        0m0.600s        0m0.464s


Significant compile time sinks in GCC 4.1 that don't appear in GCC 4.0:
 tree PTA              :   2.31 ( 7%) usr
 tree SSA incremental  :   2.14 ( 6%) usr
 expand                :   1.71 ( 5%) usr

The same passes cost the most time in GCC 4.2.  The expand cost has increades.  The other two are not new, they just run very often or didn't have their own time vars before.  The overall problem seems to be that we just run too many passes too often, nothing really stands out.

Comment 63 Daniel Berlin 2006-02-11 16:02:22 UTC
Subject: Re:  [4.1/4.2 regression] C++
	compile-time performance regression


> Flags: -O3
> 
> GCC 4.0 (release branch today):
> real    0m24.412s       0m25.000s       0m24.771s
> user    0m23.921s       0m24.430s       0m24.210s
> sys     0m0.368s        0m0.408s        0m0.420s
> 
> GCC 4.1 (release branch today):
> real    0m33.260s       0m33.140s       0m33.188s
> user    0m32.602s       0m32.522s       0m32.554s
> sys     0m0.556s        0m0.544s        0m0.600s
> 
> GCC 4.2 (trunk today):
> real    0m36.544s       0m36.614s       0m36.492s
> user    0m35.950s       0m35.942s       0m35.994s
> sys     0m0.544s        0m0.600s        0m0.464s
> 
> 
> Significant compile time sinks in GCC 4.1 that don't appear in GCC 4.0:
>  tree PTA              :   2.31 ( 7%) usr
>  tree SSA incremental  :   2.14 ( 6%) usr
>  expand                :   1.71 ( 5%) usr
> 

So, could you do me a favor if you get a chance, and change the macro
DONT_PROPAGATE_WITH_ANYTHING to 1 in tree-ssa-structalias.c, and see if
it speeds it up at all?


Comment 64 Steven Bosscher 2006-02-12 01:17:06 UTC
DONT_PROPAGATE_WITH_ANYTHING only exists on the trunk.  With that flag, the timings are:

Flags: -O2

GCC 4.2 (trunk today):
real    0m31.704s
user    0m31.094s
sys     0m0.584s


Flags: -O3

GCC 4.2 (trunk today):
real    0m36.206s
user    0m35.718s
sys     0m0.484s

So, no it doesn't help.

Again, the problem seems to be more that we just run so many passes, not that one or two specific passes are to blame for most of the compile time.
Comment 65 Mark Mitchell 2006-02-24 00:25:09 UTC
This issue will not be resolved in GCC 4.1.0; retargeted at GCC 4.1.1.
Comment 66 Mark Mitchell 2006-05-25 02:31:46 UTC
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
Comment 67 Andrew Pinski 2006-07-05 09:06:59 UTC
Does anyone have new numbers for this, Richard G.'s recent memory patches have an effect on the compile time also I noticed between 7% and 10% on at least CSiBE.
Comment 68 Steven Bosscher 2006-07-25 22:31:42 UTC
New timings.  These were taken on the same box as those of comment #62 and comment #64 (Intel x86_64 3.20GHz, 1GB ram).  Times are usr times 

Invokation: time g++ -S -fpermissive -Ox -m64 generate-3.4.ii
GC params for cc1plus: --param ggc-min-expand=98 --param ggc-min-heapsize=127550

version         ID              -O2             -O3
GCC 3.4         3.4.6           0m23.673s       0m24.362s
GCC 4.0         4.0.4 20060725  0m23.009s       0m23.849s
GCC 4.1         4.1.2 20060725  0m24.018s       0m25.294s
GCC 4.2         4.2.0 20060724  0m25.214s       0m26.242s

Comment 69 Steven Bosscher 2006-07-25 22:35:12 UTC
Re. comment #68, I should have added that all compilers were built with "gcc (GCC) 4.0.2 20050901 (prerelease) (SUSE Linux)" with CFLAGS="-O2 -g".
Comment 70 Steven Bosscher 2006-09-03 10:39:09 UTC
Based on my numbers of comment #69, I'm declaring this fixed once more.