The file src/frontends/qt2/Dialogs.C takes 4:28:52.59 to build with -O2 -fno-default-inline. Without -fno-default-inline, the machine runs out of memory. I created a profiling cc1plus and ran it against Dialogs.ii with -O2 -fno-default-inline. Results: Execution times (seconds) garbage collection : 51.76 ( 1%) usr 0.23 ( 1%) sys 110.25 ( 1%) wall cfg construction : 20.78 ( 0%) usr 0.81 ( 4%) sys 41.25 ( 0%) wall cfg cleanup : 45.46 ( 1%) usr 0.05 ( 0%) sys 90.75 ( 1%) wall trivially dead code : 16.16 ( 0%) usr 0.05 ( 0%) sys 29.00 ( 0%) wall life analysis : 110.47 ( 1%) usr 0.03 ( 0%) sys 194.00 ( 1%) wall life info update : 34.18 ( 0%) usr 0.02 ( 0%) sys 55.75 ( 0%) wall preprocessing : 2.80 ( 0%) usr 0.29 ( 2%) sys 6.75 ( 0%) wall lexical analysis : 5.37 ( 0%) usr 0.55 ( 3%) sys 11.75 ( 0%) wall parser : 146.68 ( 2%) usr 3.42 (18%) sys 323.25 ( 2%) wall name lookup : 74.06 ( 1%) usr 5.63 (30%) sys 169.75 ( 1%) wall expand : 192.98 ( 2%) usr 3.65 (19%) sys 425.00 ( 3%) wall varconst : 5.88 ( 0%) usr 0.32 ( 2%) sys 21.00 ( 0%) wall integration : 4.00 ( 0%) usr 0.05 ( 0%) sys 7.75 ( 0%) wall jump : 129.99 ( 2%) usr 0.39 ( 2%) sys 273.00 ( 2%) wall CSE : 41.12 ( 0%) usr 0.16 ( 1%) sys 86.75 ( 1%) wall global CSE : 9.06 ( 0%) usr 0.05 ( 0%) sys 18.25 ( 0%) wall loop analysis : 1.67 ( 0%) usr 0.02 ( 0%) sys 2.50 ( 0%) wall CSE 2 : 14.07 ( 0%) usr 0.03 ( 0%) sys 27.25 ( 0%) wall branch prediction : 26.31 ( 0%) usr 0.34 ( 2%) sys 53.75 ( 0%) wall flow analysis : 3.55 ( 0%) usr 0.01 ( 0%) sys 7.50 ( 0%) wall combiner : 14.36 ( 0%) usr 0.09 ( 0%) sys 29.50 ( 0%) wall if-conversion : 2.02 ( 0%) usr 0.00 ( 0%) sys 3.75 ( 0%) wall regmove : 4.02 ( 0%) usr 0.00 ( 0%) sys 7.50 ( 0%) wall scheduling :7510.98 (87%) usr 0.74 ( 4%) sys13791.00 (86%) wall local alloc : 35.48 ( 0%) usr 0.08 ( 0%) sys 51.00 ( 0%) wall global alloc : 22.34 ( 0%) usr 0.67 ( 4%) sys 48.00 ( 0%) wall reload CSE regs : 57.24 ( 1%) usr 0.09 ( 0%) sys 94.50 ( 1%) wall flow 2 : 2.78 ( 0%) usr 0.02 ( 0%) sys 6.50 ( 0%) wall if-conversion 2 : 1.15 ( 0%) usr 0.00 ( 0%) sys 2.25 ( 0%) wall peephole 2 : 4.30 ( 0%) usr 0.00 ( 0%) sys 8.50 ( 0%) wall rename registers : 5.21 ( 0%) usr 0.10 ( 1%) sys 6.75 ( 0%) wall scheduling 2 : 18.74 ( 0%) usr 0.01 ( 0%) sys 31.50 ( 0%) wall delay branch sched : 12.40 ( 0%) usr 0.05 ( 0%) sys 19.50 ( 0%) wall reorder blocks : 2.51 ( 0%) usr 0.00 ( 0%) sys 3.00 ( 0%) wall shorten branches : 1.47 ( 0%) usr 0.00 ( 0%) sys 3.00 ( 0%) wall final : 5.54 ( 0%) usr 0.28 ( 1%) sys 16.25 ( 0%) wall symout : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall rest of compilation : 21.00 ( 0%) usr 0.46 ( 2%) sys 50.50 ( 0%) wall TOTAL :8658.27 18.72 16129.25 Release: GNU C++ version 3.3 20030319 Environment: SunOS poog 5.9 Generic_112233-02 sun4u sparc SUNW,Ultra-60 How-To-Repeat: g++ -O2 -fno-default-inline Dialogs.ii
From: Albert Chin-A-Young <china@thewrittenword.com> To: steven@gcc.gnu.org, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, gcc-gnats@gcc.gnu.org Cc: Subject: Re: optimization/10160: [SPARC] inordinate time spent in "scheduling" Date: Sat, 12 Apr 2003 13:19:00 -0500 On Sat, Apr 12, 2003 at 01:22:08PM -0000, steven@gcc.gnu.org wrote: > Synopsis: [SPARC] inordinate time spent in "scheduling" > > State-Changed-From-To: open->feedback > State-Changed-By: steven > State-Changed-When: Sat Apr 12 13:22:08 2003 > State-Changed-Why: > Did this work with more reasonable compile times in previous GCC releases? > > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=10160 GCC 3.2.2 doesn't show the problem. -- albert chin (china@thewrittenword.com)
State-Changed-From-To: open->feedback State-Changed-Why: Did this work with more reasonable compile times in previous GCC releases?
Responsible-Changed-From-To: unassigned->ebotcazou Responsible-Changed-Why: Regression from 3.2.2 Looks like food for you, Eric.
State-Changed-From-To: feedback->analyzed State-Changed-Why: Confirmed regression
Responsible-Changed-From-To: ebotcazou->vmakarov Responsible-Changed-Why: No, I don't know the first thing about the DFA scheduler. David S. Miller wrote the Sparc descriptions but he is busy elsewhere so I'm redirecting the PR to Vladimir directly.
Responsible-Changed-From-To: vmakarov->ebotcazou Responsible-Changed-Why: Hum... I don't think that the scheduler is to be blamed here, rather the tree inliner: cutting the inlining limit by 10 (-finline-limit=60) brings the compile time on par with that of the 3.2.x branch. The new logic of the tree inliner is not exactly adapted to this testcase.
From: Vladimir Makarov <vmakarov@redhat.com> To: ebotcazou@gcc.gnu.org, china@thewrittenword.com, gcc-bugs@gcc.gnu.org, gcc-prs@gcc.gnu.org, vmakarov@gcc.gnu.org, gcc-gnats@gcc.gnu.org Cc: Subject: Re: optimization/10160: [3.3/3.4 regression][SPARC] compile time regression; inordinate time spent in "scheduling" Date: Wed, 16 Apr 2003 13:39:59 -0400 ebotcazou@gcc.gnu.org wrote: > > Synopsis: [3.3/3.4 regression][SPARC] compile time regression; inordinate time spent in "scheduling" > > Responsible-Changed-From-To: vmakarov->ebotcazou > Responsible-Changed-By: ebotcazou > Responsible-Changed-When: Wed Apr 16 13:14:39 2003 > Responsible-Changed-Why: > Hum... I don't think that the scheduler is to be blamed here, > rather the tree inliner: cutting the inlining limit by 10 > (-finline-limit=60) brings the compile time on par with that > of the 3.2.x branch. > > The new logic of the tree inliner is not exactly adapted to this testcase. > > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=10160 You are absolutely right. I was afraid that it is because of the first cycle multipass insn scheduling. So I switched it off and got the same result. So this is not because of the recent insn scheduling changes. Simply insn scheduling (even simplest heuristic list one) is O(n*2) algorithm. So it may behave very nasty when the input is big. There are some heuristics constraining number of dependencies but they do not help in this case. Even without insn scheduling compilation of this file takes 12 minutes on my sparc computer. Vlad
I'm changing the target milestone because the problem is a fundamental flaw in the new heuristics of the tree inliner, which I think cannot be fixed on a release branch. I'll try to come up with something sensible for the 3.4 release. Meanwhile, a workaround is to compile with -O2 --param max-inline-insns-single=180 which will bring the compile time on par with GCC 3.2.3.
Eric, how about trying this one with current mainline. I would like to see how Jan Hubicka's new function body size estimates do in this case but I don't have access to a SPARC machine. However my experience with the new code has been very positive in all cases, maybe it helps in this case, too.
With all the changes in the tree-inliner (and in particular with the call graph code) since March, the information in this bug report is obsolete. This PR really needs testing and reconfirmation if the problem still exists. Can someone test this please? I've marked this PR as WAITING for feedback, so that we can close it if no-one will test this in the next three months or so.
The informations are still valid on the 3.3 branch as of GCC 3.3.2, but I think this is not fixable on that branch. And the testcase doesn't compile on mainline anymore. Albert, do you still have the source code from which the testcase was extracted?
The test case is from LyX. I'll try to upload a new version in a few days.
You need to do it against a recent CVS snapshot of gcc-3.4 because I think we won't fix the inliner of the 3.3 branch and it seems that the new parser can't grok the preprocessed file generated by 3.3.x in this case.
We know that is bug still exist most likely but really need a new preprocessed source for 3.4.
*** Bug 13027 has been marked as a duplicate of this bug. ***
This is also a memory hog: < scheduling :7510.98 (87%) usr 0.74 ( 4%) sys13791.00 (86%) wall> See how wall time is about twice as big as user.
Created attachment 5449 [details] Preprocessed source for G++ 3.4.0 This should be the new preprocessed source, let me know if it is incorrect, I can try regenerating it.
Thanks Giovanni. I'll try tomorrow.
I tried on powerpc-apple-darwin7.2.0 and it does not have the problem in the scheduler. Also the problem I see is in the C++ front-end (but this is with checking enabled): parser : 19.86 (14%) usr 14.48 (27%) sys 94.09 (26%) wall name lookup : 20.51 (14%) usr 27.71 (51%) sys 97.30 (27%) wall
to compile the 3.4 source on 3.3, delete the first couple of lines dealing with the debuging part of libstdc++, delete some __attribute__((unused)), change remaining __gnu_norm to std. With 3.3, I can reproduce it but not with 3.4 so it looks like it has been fixed.
The results at -O2 are much better on mainline: no memory explosion (peak around 120 MB) and decent time (less than 2 minutes). I guess we can all give a big Thanks to Jan!