Bug 10160 - [3.3/3.4 regression] compile time regression; inordinate time spent in "scheduling"
Summary: [3.3/3.4 regression] compile time regression; inordinate time spent in "sched...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 3.3
: P3 normal
Target Milestone: 3.4.0
Assignee: Eric Botcazou
URL:
Keywords: compile-time-hog, memory-hog
: 13027 (view as bug list)
Depends on:
Blocks:
 
Reported: 2003-03-20 05:16 UTC by china
Modified: 2004-01-17 04:22 UTC (History)
6 users (show)

See Also:
Host: sparc-sun-solaris2.9
Target: sparc-sun-solaris2.9
Build: sparc-sun-solaris2.9
Known to work:
Known to fail:
Last reconfirmed: 2004-01-10 21:35:50


Attachments
Dialogs.ii.bz2 (203.16 KB, application/octet-stream)
2003-05-21 15:17 UTC, china
Details
Preprocessed source for G++ 3.4.0 (196.38 KB, application/octet-stream)
2004-01-10 20:49 UTC, Giovanni Bajo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description china 2003-03-20 05:16:00 UTC
The file src/frontends/qt2/Dialogs.C takes 4:28:52.59 to build  with -O2 -fno-default-inline. Without -fno-default-inline, the machine runs out of memory. I created a profiling cc1plus and ran it against Dialogs.ii with -O2 -fno-default-inline. Results:
Execution times (seconds)
 garbage collection    :  51.76 ( 1%) usr   0.23 ( 1%) sys 110.25 ( 1%) wall
 cfg construction      :  20.78 ( 0%) usr   0.81 ( 4%) sys  41.25 ( 0%) wall
 cfg cleanup           :  45.46 ( 1%) usr   0.05 ( 0%) sys  90.75 ( 1%) wall
 trivially dead code   :  16.16 ( 0%) usr   0.05 ( 0%) sys  29.00 ( 0%) wall
 life analysis         : 110.47 ( 1%) usr   0.03 ( 0%) sys 194.00 ( 1%) wall
 life info update      :  34.18 ( 0%) usr   0.02 ( 0%) sys  55.75 ( 0%) wall
 preprocessing         :   2.80 ( 0%) usr   0.29 ( 2%) sys   6.75 ( 0%) wall
 lexical analysis      :   5.37 ( 0%) usr   0.55 ( 3%) sys  11.75 ( 0%) wall
 parser                : 146.68 ( 2%) usr   3.42 (18%) sys 323.25 ( 2%) wall
 name lookup           :  74.06 ( 1%) usr   5.63 (30%) sys 169.75 ( 1%) wall
 expand                : 192.98 ( 2%) usr   3.65 (19%) sys 425.00 ( 3%) wall
 varconst              :   5.88 ( 0%) usr   0.32 ( 2%) sys  21.00 ( 0%) wall
 integration           :   4.00 ( 0%) usr   0.05 ( 0%) sys   7.75 ( 0%) wall
 jump                  : 129.99 ( 2%) usr   0.39 ( 2%) sys 273.00 ( 2%) wall
 CSE                   :  41.12 ( 0%) usr   0.16 ( 1%) sys  86.75 ( 1%) wall
 global CSE            :   9.06 ( 0%) usr   0.05 ( 0%) sys  18.25 ( 0%) wall
 loop analysis         :   1.67 ( 0%) usr   0.02 ( 0%) sys   2.50 ( 0%) wall
 CSE 2                 :  14.07 ( 0%) usr   0.03 ( 0%) sys  27.25 ( 0%) wall
 branch prediction     :  26.31 ( 0%) usr   0.34 ( 2%) sys  53.75 ( 0%) wall
 flow analysis         :   3.55 ( 0%) usr   0.01 ( 0%) sys   7.50 ( 0%) wall
 combiner              :  14.36 ( 0%) usr   0.09 ( 0%) sys  29.50 ( 0%) wall
 if-conversion         :   2.02 ( 0%) usr   0.00 ( 0%) sys   3.75 ( 0%) wall
 regmove               :   4.02 ( 0%) usr   0.00 ( 0%) sys   7.50 ( 0%) wall
 scheduling            :7510.98 (87%) usr   0.74 ( 4%) sys13791.00 (86%) wall
 local alloc           :  35.48 ( 0%) usr   0.08 ( 0%) sys  51.00 ( 0%) wall
 global alloc          :  22.34 ( 0%) usr   0.67 ( 4%) sys  48.00 ( 0%) wall
 reload CSE regs       :  57.24 ( 1%) usr   0.09 ( 0%) sys  94.50 ( 1%) wall
 flow 2                :   2.78 ( 0%) usr   0.02 ( 0%) sys   6.50 ( 0%) wall
 if-conversion 2       :   1.15 ( 0%) usr   0.00 ( 0%) sys   2.25 ( 0%) wall
 peephole 2            :   4.30 ( 0%) usr   0.00 ( 0%) sys   8.50 ( 0%) wall
 rename registers      :   5.21 ( 0%) usr   0.10 ( 1%) sys   6.75 ( 0%) wall
 scheduling 2          :  18.74 ( 0%) usr   0.01 ( 0%) sys  31.50 ( 0%) wall
 delay branch sched    :  12.40 ( 0%) usr   0.05 ( 0%) sys  19.50 ( 0%) wall
 reorder blocks        :   2.51 ( 0%) usr   0.00 ( 0%) sys   3.00 ( 0%) wall
 shorten branches      :   1.47 ( 0%) usr   0.00 ( 0%) sys   3.00 ( 0%) wall
 final                 :   5.54 ( 0%) usr   0.28 ( 1%) sys  16.25 ( 0%) wall
 symout                :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.50 ( 0%) wall
 rest of compilation   :  21.00 ( 0%) usr   0.46 ( 2%) sys  50.50 ( 0%) wall
 TOTAL                 :8658.27            18.72          16129.25

Release:
GNU C++ version 3.3 20030319

Environment:
SunOS poog 5.9 Generic_112233-02 sun4u sparc SUNW,Ultra-60

How-To-Repeat:
g++ -O2 -fno-default-inline Dialogs.ii
Comment 1 china 2003-04-12 13:19:00 UTC
From: Albert Chin-A-Young <china@thewrittenword.com>
To: steven@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org,
   gcc-gnats@gcc.gnu.org
Cc:  
Subject: Re: optimization/10160: [SPARC] inordinate time spent in "scheduling"
Date: Sat, 12 Apr 2003 13:19:00 -0500

 On Sat, Apr 12, 2003 at 01:22:08PM -0000, steven@gcc.gnu.org wrote:
 > Synopsis: [SPARC] inordinate time spent in "scheduling"
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: steven
 > State-Changed-When: Sat Apr 12 13:22:08 2003
 > State-Changed-Why:
 >     Did this work with more reasonable compile times in previous GCC releases?
 > 
 > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=10160
 
 GCC 3.2.2 doesn't show the problem.
 
 -- 
 albert chin (china@thewrittenword.com)
Comment 2 Steven Bosscher 2003-04-12 13:22:08 UTC
State-Changed-From-To: open->feedback
State-Changed-Why: Did this work with more reasonable compile times in previous GCC releases?
Comment 3 Steven Bosscher 2003-04-14 07:18:19 UTC
Responsible-Changed-From-To: unassigned->ebotcazou
Responsible-Changed-Why: Regression from 3.2.2
    
    Looks like food for you, Eric.
Comment 4 Steven Bosscher 2003-04-14 07:18:19 UTC
State-Changed-From-To: feedback->analyzed
State-Changed-Why: Confirmed regression
Comment 5 Eric Botcazou 2003-04-14 08:26:33 UTC
Responsible-Changed-From-To: ebotcazou->vmakarov
Responsible-Changed-Why: No, I don't know the first thing about the DFA scheduler.
    David S. Miller wrote the Sparc descriptions but he is busy
    elsewhere so I'm redirecting the PR to Vladimir directly.
Comment 6 Eric Botcazou 2003-04-16 13:14:39 UTC
Responsible-Changed-From-To: vmakarov->ebotcazou
Responsible-Changed-Why: Hum... I don't think that the scheduler is to be blamed here,
    rather the tree inliner: cutting the inlining limit by 10
    (-finline-limit=60) brings the compile time on par with that
    of the 3.2.x branch.
    
    The new logic of the tree inliner is not exactly adapted to this testcase.
Comment 7 Vladimir Makarov 2003-04-16 13:39:59 UTC
From: Vladimir Makarov <vmakarov@redhat.com>
To: ebotcazou@gcc.gnu.org, china@thewrittenword.com,
	gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, vmakarov@gcc.gnu.org,
	gcc-gnats@gcc.gnu.org
Cc:  
Subject: Re: optimization/10160: [3.3/3.4 regression][SPARC] compile time 
 regression; inordinate time spent in "scheduling"
Date: Wed, 16 Apr 2003 13:39:59 -0400

 ebotcazou@gcc.gnu.org wrote:
 > 
 > Synopsis: [3.3/3.4 regression][SPARC] compile time regression; inordinate time spent in "scheduling"
 > 
 > Responsible-Changed-From-To: vmakarov->ebotcazou
 > Responsible-Changed-By: ebotcazou
 > Responsible-Changed-When: Wed Apr 16 13:14:39 2003
 > Responsible-Changed-Why:
 >     Hum... I don't think that the scheduler is to be blamed here,
 >     rather the tree inliner: cutting the inlining limit by 10
 >     (-finline-limit=60) brings the compile time on par with that
 >     of the 3.2.x branch.
 > 
 >     The new logic of the tree inliner is not exactly adapted to this testcase.
 > 
 > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=10160
 
 You are absolutely right.  I was afraid that it is because of the first
 cycle multipass insn scheduling.  So I switched it off and got the same
 result.  So this is not because of the recent insn scheduling changes. 
 Simply insn scheduling (even simplest heuristic list one) is O(n*2)
 algorithm.  So it may behave very nasty when the input is big.  There
 are some heuristics constraining number of dependencies but they do not
 help in this case.  Even without insn scheduling compilation of this
 file takes 12 minutes on my sparc computer.
 
 Vlad
Comment 8 Eric Botcazou 2003-06-18 08:04:39 UTC
I'm changing the target milestone because the problem is a fundamental flaw in
the new heuristics of the tree inliner, which I think cannot be fixed on a
release branch. I'll try to come up with something sensible for the 3.4 release.

Meanwhile, a workaround is to compile with

   -O2 --param max-inline-insns-single=180

which will bring the compile time on par with GCC 3.2.3.
Comment 9 Steven Bosscher 2003-07-25 11:11:06 UTC
Eric, how about trying this one with current mainline.  I would like to see how
Jan Hubicka's new function body size estimates do in this case but I don't have
access to a SPARC machine.  However my experience with the new code has been
very positive in all cases, maybe it helps in this case, too.
Comment 10 Steven Bosscher 2003-09-04 10:17:25 UTC
With all the changes in the tree-inliner (and in particular with the call graph
code) since March, the information in this bug report is obsolete. This PR
really needs testing and reconfirmation if the problem still exists.  Can
someone test this please?

I've marked this  PR as WAITING for feedback, so that we can close it if no-one
will test this in the next three months or so.
Comment 11 Eric Botcazou 2003-09-04 21:08:41 UTC
The informations are still valid on the 3.3 branch as of GCC 3.3.2, but I think
this is not fixable on that branch. And the testcase doesn't compile on mainline
anymore.

Albert, do you still have the source code from which the testcase was extracted?
Comment 12 The Written Word 2003-09-04 21:17:33 UTC
The test case is from LyX. I'll try to upload a new version in a few days.
Comment 13 Eric Botcazou 2003-09-04 22:26:36 UTC
You need to do it against a recent CVS snapshot of gcc-3.4 because I think we
won't fix the inliner of the 3.3 branch and it seems that the new parser can't
grok the preprocessed file generated by 3.3.x in this case.
Comment 14 Andrew Pinski 2003-12-01 00:24:41 UTC
We know that is bug still exist most likely but really need a new preprocessed source for 3.4.
Comment 15 Andrew Pinski 2003-12-24 20:10:54 UTC
*** Bug 13027 has been marked as a duplicate of this bug. ***
Comment 16 Andrew Pinski 2003-12-24 20:12:58 UTC
This is also a memory hog: < scheduling            :7510.98 (87%) usr   0.74 ( 4%) 
sys13791.00 (86%) wall>
See how wall time is about twice as big as user.
Comment 17 Giovanni Bajo 2004-01-10 20:49:32 UTC
Created attachment 5449 [details]
Preprocessed source for G++ 3.4.0

This should be the new preprocessed source, let me know if it is incorrect, I
can try regenerating it.
Comment 18 Eric Botcazou 2004-01-10 21:35:50 UTC
Thanks Giovanni.  I'll try tomorrow.
Comment 19 Andrew Pinski 2004-01-10 21:39:00 UTC
I tried on powerpc-apple-darwin7.2.0 and it does not have the problem in the scheduler.
Also the problem I see is in the C++ front-end (but this is with checking enabled):
 parser                :  19.86 (14%) usr  14.48 (27%) sys  94.09 (26%) wall
 name lookup           :  20.51 (14%) usr  27.71 (51%) sys  97.30 (27%) wall
Comment 20 Andrew Pinski 2004-01-11 03:05:04 UTC
to compile the 3.4 source on 3.3, delete the first couple of lines dealing with the debuging 
part of libstdc++, delete some __attribute__((unused)), change remaining __gnu_norm 
to std.  With 3.3, I can reproduce it but not with 3.4 so it looks like it has been fixed.
Comment 21 Eric Botcazou 2004-01-12 08:22:59 UTC
The results at -O2 are much better on mainline: no memory explosion (peak around
120 MB) and decent time (less than 2 minutes).

I guess we can all give a big Thanks to Jan!