Summary: | [3.4/4.0/4.1 Regression] g++ crash with -O2 and -O3 on input file | ||
---|---|---|---|
Product: | gcc | Reporter: | Andrew Begel <abegel> |
Component: | middle-end | Assignee: | Jeffrey A. Law <jeffreyalaw> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | dnovillo, gcc-bugs, giovannibajo, law, mmitchel, pinskia, rguenth |
Priority: | P2 | Keywords: | compile-time-hog, memory-hog |
Version: | 3.4.0 | ||
Target Milestone: | 4.0.3 | ||
Host: | i686-pc-linux-gnu | Target: | i686-pc-linux-gnu |
Build: | i686-pc-linux-gnu | Known to work: | |
Known to fail: | Last reconfirmed: | 2005-10-14 04:49:38 | |
Bug Depends on: | 18587 | ||
Bug Blocks: | |||
Attachments: |
Preprocessed source code that crashes g++ with -O2 and -O3
Cleaned up testcase PPP PPP PPP PPP PPP expat.h expat_external.h |
Description
Andrew Begel
2004-06-06 19:10:38 UTC
Created attachment 6484 [details]
Preprocessed source code that crashes g++ with -O2 and -O3
*** Bug 15854 has been marked as a duplicate of this bug. *** How much memory do you have in that machine? With 3.5.0 I get a peak about 500M then going down to 128M. (In reply to comment #3) > How much memory do you have in that machine? > > With 3.5.0 I get a peak about 500M then going down to 128M. 512 MB of RAM. When compiling with gcc 3.4.0, it stays around 200M for a while, then starts rising to a peak of 438M when it then dies from the g++ internal failure. Hmm, on the mainline I get this error: ../../lk/version/global.h:241: error: explicit specialization of `VDifferential<char>::Summary VDifferential<char>::summarize(const VG*, GVID, GVID)' must be introduced by `template <>' ../../lk/version/global.h:241: error: template-id `summarize<>' for `VDifferential<char>::Summary VDifferential<char>::summarize(const VG*, GVID, GVID)' does not match any template declaration This looks like a case where unit-at-a-time is inlining more than at -O1. Postponed until GCC 3.4.3. Now on the mainline (with checking still enabled) we get a MAX of 328M of GC memory allocated and then go down to 189M. This is better but it can be improved still. Also this memory usage was in the front-end before optimizations. Never mind, I was looking at the wrong thing, it was after the parser was done. Postponed until GCC 3.4.4. These part are 4.0 regressions: tree alias analysis : 24.22 ( 6%) usr 0.75 ( 2%) sys 26.57 ( 5%) wall tree PHI insertion : 7.71 ( 2%) usr 1.26 ( 3%) sys 9.24 ( 2%) wall tree SSA rewrite : 11.52 ( 3%) usr 4.32 (11%) sys 20.63 ( 4%) wall tree SSA other : 21.33 ( 5%) usr 3.77 (10%) sys 27.79 ( 5%) wall tree operand scan : 26.65 ( 7%) usr 3.78 (10%) sys 31.64 ( 6%) wall These are most likely not: combiner : 25.57 ( 6%) usr 0.20 ( 1%) sys 26.78 ( 5%) wall scheduling : 211.44 (52%) usr 4.35 (11%) sys 225.27 (43%) wall The combiner problem comes from the distrubute_notes loop. One more thing the schedular compile time problem goes away with -fno-PIC (this is on ppc-darwin). Jeff, this is another baby you might enjoy playing with when you are done with the other PR :) Created attachment 7585 [details]
Cleaned up testcase
I have cleaned up the testcase so that it does not emit tons of errors anymore.
Also, I have unincluded it to make it compiler/platform independent. Unluckily,
it won't compile on 2.95 because of <ext/hash_map> and <ext/hash_set>.
(In reply to comment #12) > These part are 4.0 regressions: Improvements already (note the scheduling goes away with -fno-PIC at least on ppc-darwin): tree alias analysis : 16.84 (12%) usr 0.40 ( 1%) sys 17.74 ( 8%) wall tree PHI insertion : 7.94 ( 5%) usr 1.00 ( 3%) sys 9.31 ( 4%) wall tree SSA rewrite : 10.79 ( 7%) usr 4.11 (12%) sys 17.07 ( 8%) wall tree SSA other : 20.02 (14%) usr 2.51 ( 7%) sys 24.59 (11%) wall tree operand scan : 8.15 ( 6%) usr 3.38 (10%) sys 11.66 ( 5%) wall scheduling : 13.15 ( 9%) usr 5.78 (17%) sys 19.44 ( 9%) wall Subject: Re: [3.4/4.0 Regression] g++ crash with -O2 and -O3 on input file Here's a nice little 15% compile time improvement when checking is enabled for this testcase./ I'm always amazed how how expensive certain primitive operations get in our checking code. And this is no exception. The testcode for pr15855 is spending insane amounts of time doing alias verification. I suspect a large part of the underlying problem is that we're creating way too many SSA_NAMEs for a couple routines (we're creating roughly 2-3 million then deleting about half of them -- which is somehow related to the may-alias code). Regardless, creating all those SSA_NAMEs is useful in that it's exposing some lameness elsewhere. For example in verify_ssa_flow_sensitive_aliases we burn an _amazing_ amount of time getting variable annotations. That's right, getting (*&@#$ variable annotations. Worse yet, the vast majority of the time we never even look at the annotations we retrieved because the variables aren't pointers or for some other reason. How much time? Try nearly 20 out of 140 seconds of compilation time. (yes, that's with an optimized, non-profiled compiler). Yes, nearly 15% of or compilation time spent getting variable annotations for variables we don't care about. Delaying retrieval of the pointer information, swapping two of the tests for continuing the loop early shave another second or so off the compilation time. Bootstrapped and regression tested on i686-pc-linux-gnu. Created attachment 7614 [details]
PPP
Subject: Re: [3.4/4.0 Regression] g++ crash with -O2 and -O3 on input file It's always amazing to see how such simple oversights can result in such a dramatic difference in the code we generate and to a smaller extent our compile-time performance. For this PR we actually spend considerable time compiling the the static initialization and destruction routine. Yes, that's right... The C++ front-end presents us with code like this: <<< Unknown tree: if_stmt __priority == 65535 && __initialize_p == 1 <<cleanup_point <<< Unknown tree: expr_stmt __comp_ctor (&__ioinit) >>> >> >>> ; <<< Unknown tree: if_stmt __priority == 65535 && __initialize_p == 1 <<cleanup_point <<< Unknown tree: expr_stmt __comp_ctor (&phylum_info, 131, (const char *) "trans_unit", 3, 0, 1, 0, 0, 0, 0, 1, 1) >>> >> >>> ; Which repeats over and over and over (around a thousand times). The if conditions remain the same, but the actions within the IF statement change. We gimplify that into: if (__priority == 65535) { if (__initialize_p == 1) { __comp_ctor (&__ioinit); } else { } } else { } if (__priority == 65535) { if (__initialize_p == 1) { __comp_ctor (&phylum_info, 131, &"trans_unit"[0], 3, 0, 1, 0, 0, 0, 0, 1, 1); } else { } } [ ... ] It doesn't take a rocket scientist to realize that we've got a lot of redundant tests in this code and it really should look something like if (__priority == 65535) if (__initialize == 1) { action1; action2; ... actionN; } When I looked at the DOM1 dump file I was rather annoyed to find that while it successfully threaded away all the __priority tests, but left in all the __initialize tests. Ugh. That can't be good. I was pleasantly surprised to see that one iteration of DOM was sufficient to do all the threading of the __priority tests, that's good from a compile-time performance standpoint. What I was surprised to find was that DOM1 did not iterate! Thus it didn't thread all the __initialize tests until DOM2. cleanup_tree_cfg didn't find any control statements to remove, unreachable blocks or jumps to thread. So it returned false. It did however merge roughly a thousand blocks. But we do not propagate that to the callers of cleanup_tree_cfg. Which is the root of the problem. DOM's jump threader doesn't look through multiple blocks. So while block merging won't expose new control flow cleanups, unreachable blocks or jump threads for cleanup_tree_cfg, it may expose new jump threading opportunities for DOM's jump threader. Fixing this little oversight resulted in DOM1 threading all the conditional in the target function leaving us with optimal code in a total of 4 basic blocks. And the best news of all, this _improves_ compile time performance for this testcase (by about a percent). Bootstrapped and regression tested on i686-pc-linux-gnu. Did you try if iterating DOM1 has compile time impact on more normal testcases, like cc1-i? Subject: Re: [3.4/4.0 Regression] g++ crash with -O2
and -O3 on input file
On Sat, 2004-11-27 at 08:17 +0000, giovannibajo at libero dot it wrote:
> ------- Additional Comments From giovannibajo at libero dot it 2004-11-27 08:17 -------
> Did you try if iterating DOM1 has compile time impact on more normal testcases,
> like cc1-i?
The effect on more normal C code is in the noise. Though my gut tells
me it's a teeny tiny slower.
jeff
Subject: Re: [3.4/4.0 Regression] g++ crash with -O2 and -O3 on input file This patch conditionally creates GLOBAL_VAR before the initial translation into SSA form (and thus before the initial call to compute_may_aliases). This is vital to avoid compile time and memory explosions in code with a large number of calls and a large number of call clobbered variables -- rather than creating lots of virtual operands at each call site, we create a single V_MAY_DEF for GLOBAL_VAR. The first may-alias pass would take care of this stuff for us -- but waiting until then means that the into-ssa translation has to deal with all these extra operands. To give you some idea of what "extra" means -- the function in question has ~1800 call sites. Each of the call sites has approximately 800 V_MAY_DEFs and 1000 VUSEs without this patch. That's err, a lot of operands to deal with. Here's some of the key timevars which show the effect of this change: garbage collection : 2.82 ( 4%) usr 0.01 ( 0%) sys 2.82 ( 4%) tree PTA : 0.96 ( 1%) usr 0.00 ( 0%) sys 0.97 ( 1%) tree alias analysis : 5.61 ( 9%) usr 0.05 ( 0%) sys 5.62 ( 7%) tree PHI insertion : 4.85 ( 7%) usr 0.18 ( 2%) sys 5.04 ( 7%) tree SSA rewrite : 8.08 (12%) usr 0.21 ( 2%) sys 8.35 (11%) tree SSA other : 10.63 (16%) usr 1.04 (10%) sys 10.91 (14%) tree operand scan : 6.20 ( 9%) usr 0.97 ( 9%) sys 7.89 (10%) dominator optimization: 1.69 ( 3%) usr 0.10 ( 1%) sys 1.76 ( 2%) tree CCP : 0.42 ( 1%) usr 0.09 ( 1%) sys 0.44 ( 1%) tree SSA to normal : 0.50 ( 1%) usr 0.52 ( 5%) sys 1.24 ( 2%) tree rename SSA copies: 0.22 ( 0%) usr 0.32 ( 3%) sys 0.55 ( 1%) dominance frontiers : 0.02 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%) expand : 2.20 ( 3%) usr 0.19 ( 2%) sys 2.27 ( 3%) TOTAL : 65.47 10.83 76.53 And after this patch: garbage collection (none :-) tree PTA : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) tree alias analysis : 4.17 (14%) usr 0.03 ( 0%) sys 4.19 (11%) tree PHI insertion : 0.28 ( 1%) usr 0.01 ( 0%) sys 0.28 ( 1%) tree SSA rewrite : 0.42 ( 1%) usr 0.02 ( 0%) sys 0.46 ( 1%) tree SSA other : 1.26 ( 4%) usr 0.82 ( 9%) sys 1.99 ( 5%) tree operand scan : 1.16 ( 4%) usr 0.73 ( 8%) sys 1.95 ( 5%) dominator optimization: 1.32 ( 4%) usr 0.11 ( 1%) sys 1.58 ( 4%) tree CCP : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) tree SSA to normal : 0.22 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 0%) tree rename SSA copies: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) expand : 1.31 ( 4%) usr 0.15 ( 2%) sys 1.50 ( 4%) TOTAL : 30.26 9.04 39.37 Those were the big improvements -- there were several other passes which appear to show small improvements. Overall we're looking at more than a 50% reduction in compile time and since GC fell off the timevar charts, I can make an educated guess that we're allocating a lot less memory too. The may-aliasing code is still a major time-sink and I expect to find further small gains in that code. Bootstrapped and regression tested on i686-pc-linux-gnu. Created attachment 7628 [details]
PPP
Jeff, thanks for working on these problems! BTW, I'm assigning this bug to you since you are the one working on it right now. Subject: Re: [3.4/4.0 Regression] g++ crash with -O2 and -O3 on input file This gives us a small (.5%), but measurable speedup for 15855. Given two sbitmaps, there's a few places where we just want to know if there are bits in common. We don't care about precisely what those bits are -- just whether or not there are some in common. Right now we use sbitmap_a_and_b, then scan the result to see if any bits are set. That is clearly inefficient; let me count the ways... 1. We have to have an sbitmap for the destination. That means allocation and teardown of an extra sbitmap. 2. For every bit X that is set in A & B we have to set the appropriate bit in the destination sbitmap. 3. We do not need to scan any bits beyond the first which is common in A & B. Nor do we really need to set any of those bits in the destination. 4. After the sbitmap_a_and_b call is completely we have to scan the destination bitmap to see if any bits are set. Ugh. This patch introduces a new sbitmap function which just returns a boolean indicating whether or not the two given bitmaps have any common bits set. It does not tell us what bits were common, how many, or anything like that. Less work, less memory allocated, faster compilation times. Mmm, good. Bootstrapped and regression tested on i686-pc-linux-gnu. Created attachment 7646 [details]
PPP
Subject: Re: [3.4/4.0 Regression] g++ crash with -O2 and -O3 on input file While analyzing the compile-time performance problems one of the things I noticed was a _ton_ of save_eptr and save_filt variables which are used to hold a value over a call to operator delete after which the variable always dies. In fact, there's 871 of each variable in the one key function in this testcase. Each one of those variables takes a slot in referenced_vars. So every loop over referenced_vars was to look at them. And, yes, there are enough loops over referenced_vars to matter. By re-using a single variable for save_eptr and save_filt we cut 1740 entries out of the referenced_vars array (roughly 7.5% reduction in variables). Normally I would be strongly against such a change as globbing variables like this will tend to increase the number of PHI_NODES and SSA_NAMEs. But this case is pretty special because of the usage characteristics of save_eptr and save_filt. In fact, we have precisely the same number of PHI_NODEs and SSA_NAMEs before and after this change. Our net compile-time improvement is around .6%. Nothing huge, but then again, we're not doing a lot of work to get that .6% improvement. I'm not checking in this change at the current time to give folks (particularly Richard & Jason) a chance to comment. It has bootstrapped and regression tested on i686-pc-linux-gnu, of course. Created attachment 7651 [details]
PPP
Subject: Re: [3.4/4.0 Regression] g++ crash with -O2 and -O3 on input file
On Wed, Dec 01, 2004 at 11:05:43AM -0700, Jeffrey A Law wrote:
> I'm not checking in this change at the current time to give folks
> (particularly Richard & Jason) a chance to comment.
If you get that much improvement out of it, sure. Might want to
put a comment before the static variables discussing this issue.
r~
Subject: Re: [3.4/4.0 Regression] g++ crash with -O2
and -O3 on input file
On Wed, 2004-12-01 at 10:32 -0800, Richard Henderson wrote:
> On Wed, Dec 01, 2004 at 11:05:43AM -0700, Jeffrey A Law wrote:
> > I'm not checking in this change at the current time to give folks
> > (particularly Richard & Jason) a chance to comment.
>
> If you get that much improvement out of it, sure. Might want to
> put a comment before the static variables discussing this issue.
Here's what I actually checked in. Only changes were in comments.
Cheers,
Jeff
Created attachment 7654 [details]
PPP
Subject: Re: [3.4/4.0 Regression] g++ crash with -O2 and -O3 on input file
On Wed, Dec 01, 2004 at 09:56:51PM -0700, Jeffrey A Law wrote:
> * tree.h (save_eptr, save_filt): Now file scoped statics.
> (honor_protect_cleanup_actions): Only create save_eptr and
> save_filt if they do not already exist.
> (lower_eh_constructs): Wipe all knowledge of save_eptr and
> save_filt before returning.
As I just discovered while working through the existance of these
saves for DannyB, this transformation is incorrect and indeed
invalidates the entire reason for this save/restore.
The best that we could do is a stack of variable pairs so that
nested eh regions are handled properly.
Sorry for my earlier misdiagnosis.
r~
Subject: Re: [3.4/4.0 Regression] g++ crash with -O2
and -O3 on input file
On Wed, 2004-12-01 at 22:08 -0800, Richard Henderson wrote:
> On Wed, Dec 01, 2004 at 09:56:51PM -0700, Jeffrey A Law wrote:
> > * tree.h (save_eptr, save_filt): Now file scoped statics.
> > (honor_protect_cleanup_actions): Only create save_eptr and
> > save_filt if they do not already exist.
> > (lower_eh_constructs): Wipe all knowledge of save_eptr and
> > save_filt before returning.
>
> As I just discovered while working through the existance of these
> saves for DannyB, this transformation is incorrect and indeed
> invalidates the entire reason for this save/restore.
>
> The best that we could do is a stack of variable pairs so that
> nested eh regions are handled properly.
>
> Sorry for my earlier misdiagnosis.
Worse things have happened in this world. The patch has been reverted.
jeff
Could someone (Jeff) update the summary for this PR to reflect the actual status of this PR and the problems being addressed? Does the test case from the submitter still crash g++ on mainline? Subject: Re: [3.4/4.0 Regression] g++ crash with -O2
and -O3 on input file
On Wed, 2004-12-22 at 19:20 +0000, steven at gcc dot gnu dot org wrote:
> ------- Additional Comments From steven at gcc dot gnu dot org 2004-12-22 19:20 -------
> Could someone (Jeff) update the summary for this PR to reflect the
> actual status of this PR and the problems being addressed?
>
> Does the test case from the submitter still crash g++ on mainline?
I don't believe it crashes, but the amount of time we're spending in
the aliasing code is, err, on the order of 50% of compilation time
last I checked. Fixing it effectively involves rewriting
tree-ssa-alias.c, which I am doing. However, it'll probably be
several weeks before I have anything ready for serious examination
due to personal commitments, vacation, etc.
jeff
Moving to 4.0.2 pre Mark. at -O1 on the mainline on powerpc-darwin, we now take over 800M which seems very high. I could not finish the build of this source as it was just taking too long. On ia64-linux postreload-gcse does exactly nothing and takes 35% of the compile time at -O1 -fgcse-after-reload. I didn't finish a build at -O2. Subject: Re: [3.4/4.0/4.1 Regression] g++ crash with
-O2 and -O3 on input file
phython at gcc dot gnu dot org wrote:
> ------- Additional Comments From phython at gcc dot gnu dot org 2005-08-12 06:21 -------
> On ia64-linux postreload-gcse does exactly nothing and takes 35% of the compile
> time at -O1 -fgcse-after-reload. I didn't finish a build at -O2.
What testcase and compiler version are you using for this? I tried
taking a look, but wasn't able to reproduce any problem. I was able to
compile both testcases in the PR at -O2 in about 20 secs and about 200MB.
Both testcases in the PR have issues. The first one generates lots of
errors from the C++ front end, and doesn't actually end up doing much
compiling. The second one wants a non-existent expat.h file and is thus
uncompilable without changes. If I delete the include of the missing
expat.h file, then again I get C++ front end errors and little compilation.
I tried various gcc versions, 4.0.x, mainline, 3.3.x, but I got the same
result from all of them.
Maybe you have a copy of the missing expat.h file?
Created attachment 9484 [details]
expat.h
Created attachment 9485 [details]
expat_external.h
The second testcase works for me on current mainline if adding a class UltraRoot forward declaration at the top. It takes around 1m30 to compile and uses up max. 522913kB of memory (1GB box, P4 2.8GHz) at -O2. Still aliasing accounts for the most time: samples % image name symbol name 694 26.6411 cc1plus add_stmt_operand 209 8.0230 cc1plus ldst_entry 193 7.4088 cc1plus create_ssa_artficial_load_stmt 181 6.9482 cc1plus compute_global_livein 57 2.1881 cc1plus bitmap_bit_p 50 1.9194 no-vmlinux (no symbols) 45 1.7274 cc1plus compute_may_aliases 42 1.6123 cc1plus invalidate 36 1.3820 cc1plus bitmap_ior_and_compl_into 35 1.3436 cc1plus bitmap_set_bit 35 1.3436 cc1plus for_each_rtx_1 33 1.2668 cc1plus check_dependence 29 1.1132 cc1plus splay_tree_splay_helper 28 1.0749 cc1plus htab_find_slot_with_hash The 2nd is from GCSE, 3rd from DOM, 4th from either into-ssa or ssa-loop-manip. Time-report: tree SSA incremental : 14.27 (17%) usr 0.11 ( 3%) sys 14.46 (16%) wall 12827 kB ( 2%) ggc tree operand scan : 15.62 (18%) usr 0.27 ( 8%) sys 15.93 (18%) wall 59212 kB (10%) ggc dominator optimization: 8.67 (10%) usr 0.05 ( 1%) sys 8.64 (10%) wall 110089 kB (19%) ggc PRE : 6.14 ( 7%) usr 0.01 ( 0%) sys 6.13 ( 7%) wall 536 kB ( 0%) ggc ldst_entry is walking a list to find an element by hash-id. Throw some memory at it to add a real hashtable for lookup besides the list. The DOM stuff looks like value numbering still in DOM, hopefully it will be ripped out. Now we have again (?? appeared not too long ago) DOM taking all time and memory threading 10000 times repeating if (__priority == 65535) { if (__initialize_p == 0) { __comp_dtor (&__ioinit); } else { } } else { } in __static_initialization Ugh. For a cut-down testcase (the one attached takes too much memory for my box) detailled mem-report shows tree-dfa.c:175 (create_stmt_ann) 12749568: 2.0% 2932228: 1.0% 52: 0.0% 0: 0.0% 301574 tree-inline.c:2403 (copy_tree_r) 19285144: 3.0% 0: 0.0% 0: 0.0% 162184: 0.3% 537068 tree-ssanames.c:147 (make_ssa_name) 201319404:31.0% 0: 0.0% 12792: 0.0% 0: 0.0% 3871773 tree-phinodes.c:156 (allocate_phi_node) 250714968:38.6% 0: 0.0% 0: 0.0% 11480: 0.0% 1956706 Total 649892049 291182446 47860808 47260199 12874808 source location Garbage Freed Leak Overhead Times where the ssa_names / phi_nodes are all allocated by DOM. From the time-report: tree SSA incremental : 19.21 (11%) usr 2.02 (30%) sys 21.23 (12%) wall 365332 kB (40%) ggc tree operand scan : 22.06 (13%) usr 0.26 ( 4%) sys 22.47 (12%) wall 58811 kB ( 6%) ggc dominator optimization: 19.14 (11%) usr 0.13 ( 2%) sys 19.37 (11%) wall 111380 kB (12%) ggc Subject: Bug 15855 CVSROOT: /cvs/gcc Module name: gcc Changes by: rguenth@gcc.gnu.org 2005-09-26 08:38:32 Modified files: gcc : ChangeLog gcse.c Log message: 2005-09-26 Richard Guenther <rguenther@suse.de> PR middle-end/15855 * gcse.c: Include hashtab.h, define ldst entry hashtable. (pre_ldst_expr_hash, pre_ldst_expr_eq): New functions. (ldst_entry): Use the hashtable instead of list-walking. (find_rtx_in_ldst): Likewise. (free_ldst_entry): Free the hashtable. (compute_ld_motion_mems): Create the hashtable. (trim_ld_motion_mems): Remove entry from hashtable if removing it from list. (compute_store_table): Likewise^2. (store_motion): Free hashtable in case we did not see any stores. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.10020&r2=2.10021 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/gcse.c.diff?cvsroot=gcc&r1=1.348&r2=1.349 Subject: Bug 15855 CVSROOT: /cvs/gcc Module name: gcc Changes by: rguenth@gcc.gnu.org 2005-09-26 08:43:00 Modified files: gcc/cp : ChangeLog decl2.c Log message: 2005-09-26 Richard Guenther <rguenther@suse.de> PR middle-end/15855 * decl2.c (do_static_destruction): Remove. (finish_static_initialization_or_destruction): Likewise. (DECL_EFFECTIVE_INIT_PRIORITY): New macro. (NEEDS_GUARD_P): Likewise. (do_static_initialization): Rename to do_static_initialization_or_destruction. Process all initializers/destructors and handle common conditionalizing. (start_static_initialization_or_destruction): Rename to one_static_initialization_or_destruction. Handle only decl-specific conditionalizing. (cp_finish_file): Call do_static_initialization_or_destruction. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&r1=1.4901&r2=1.4902 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/decl2.c.diff?cvsroot=gcc&r1=1.802&r2=1.803 Subject: Bug 15855 CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-4_0-branch Changes by: rguenth@gcc.gnu.org 2005-10-07 10:54:10 Modified files: gcc/cp : ChangeLog decl2.c Log message: 2005-10-07 Richard Guenther <rguenther@suse.de> Backport from mainline 2005-09-26 Richard Guenther <rguenther@suse.de> PR middle-end/15855 * decl2.c (do_static_destruction): Remove. (finish_static_initialization_or_destruction): Likewise. (DECL_EFFECTIVE_INIT_PRIORITY): New macro. (NEEDS_GUARD_P): Likewise. (do_static_initialization): Rename to do_static_initialization_or_destruction. Process all initializers/destructors and handle common conditionalizing. (start_static_initialization_or_destruction): Rename to one_static_initialization_or_destruction. Handle only decl-specific conditionalizing. (cp_finish_file): Call do_static_initialization_or_destruction. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=1.4648.2.118&r2=1.4648.2.119 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/cp/decl2.c.diff?cvsroot=gcc&only_with_tag=gcc-4_0-branch&r1=1.770.2.9&r2=1.770.2.10 So trying to get some updated comparison I noticed that the testcase fails to compile with 3.3.x and compared to 3.4.x we have improved a lot wrt -O2 compile-time: 3.4: 1m32s, peak at 230MB 4.1: 48s, peak at 480MB Still we are using too much memory. Time-report for 4.1 does no longer show obvious problems: Execution times (seconds) garbage collection : 0.50 ( 1%) usr 0.00 ( 0%) sys 0.50 ( 1%) wall 0 kB ( 0%) ggc callgraph construction: 0.17 ( 0%) usr 0.01 ( 0%) sys 0.19 ( 0%) wall 1898 kB ( 0%) ggc callgraph optimization: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 422 kB ( 0%) ggc ipa reference : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 97 kB ( 0%) ggc ipa pure const : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa type escape : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc cfg construction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 589 kB ( 0%) ggc cfg cleanup : 0.26 ( 1%) usr 0.00 ( 0%) sys 0.27 ( 1%) wall 511 kB ( 0%) ggc trivially dead code : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall 0 kB ( 0%) ggc life analysis : 1.43 ( 3%) usr 0.00 ( 0%) sys 1.41 ( 3%) wall 91 kB ( 0%) ggc life info update : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 11 kB ( 0%) ggc alias analysis : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 3249 kB ( 1%) ggc register scan : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 49 kB ( 0%) ggc rebuild jump labels : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.52 ( 1%) usr 0.29 (11%) sys 0.78 ( 1%) wall 732 kB ( 0%) ggc parser : 3.27 ( 7%) usr 0.42 (15%) sys 3.84 ( 7%) wall 121022 kB (21%) ggc name lookup : 0.65 ( 1%) usr 0.48 (18%) sys 1.15 ( 2%) wall 9424 kB ( 2%) ggc inline heuristics : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 1163 kB ( 0%) ggc integration : 0.90 ( 2%) usr 0.02 ( 1%) sys 0.93 ( 2%) wall 46704 kB ( 8%) ggc tree gimplify : 0.41 ( 1%) usr 0.01 ( 0%) sys 0.47 ( 1%) wall 7205 kB ( 1%) ggc tree eh : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 1725 kB ( 0%) ggc tree CFG construction : 0.04 ( 0%) usr 0.03 ( 1%) sys 0.06 ( 0%) wall 5236 kB ( 1%) ggc tree CFG cleanup : 0.25 ( 1%) usr 0.02 ( 1%) sys 0.29 ( 1%) wall 332 kB ( 0%) ggc tree VRP : 0.37 ( 1%) usr 0.05 ( 2%) sys 0.44 ( 1%) wall 2375 kB ( 0%) ggc tree copy propagation : 1.22 ( 2%) usr 0.23 ( 8%) sys 1.43 ( 3%) wall 997 kB ( 0%) ggc tree store copy prop : 0.09 ( 0%) usr 0.06 ( 2%) sys 0.16 ( 0%) wall 202 kB ( 0%) ggc tree find ref. vars : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 6215 kB ( 1%) ggc tree PTA : 1.80 ( 4%) usr 0.03 ( 1%) sys 1.88 ( 4%) wall 4061 kB ( 1%) ggc tree alias analysis : 3.18 ( 6%) usr 0.08 ( 3%) sys 3.40 ( 6%) wall 5523 kB ( 1%) ggc tree PHI insertion : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 459 kB ( 0%) ggc tree SSA rewrite : 1.80 ( 4%) usr 0.14 ( 5%) sys 2.00 ( 4%) wall 120115 kB (21%) ggc tree SSA other : 0.17 ( 0%) usr 0.03 ( 1%) sys 0.19 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental : 4.90 (10%) usr 0.05 ( 2%) sys 4.86 ( 9%) wall 5152 kB ( 1%) ggc tree operand scan : 3.98 ( 8%) usr 0.29 (11%) sys 4.21 ( 8%) wall 58117 kB (10%) ggc dominator optimization: 1.70 ( 3%) usr 0.04 ( 1%) sys 1.81 ( 3%) wall 100326 kB (17%) ggc tree SRA : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 17 kB ( 0%) ggc tree STORE-CCP : 0.15 ( 0%) usr 0.02 ( 1%) sys 0.11 ( 0%) wall 193 kB ( 0%) ggc tree CCP : 0.54 ( 1%) usr 0.03 ( 1%) sys 0.56 ( 1%) wall 923 kB ( 0%) ggc tree split crit edges : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 820 kB ( 0%) ggc tree reassociation : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc tree PRE : 0.34 ( 1%) usr 0.00 ( 0%) sys 0.30 ( 1%) wall 2854 kB ( 0%) ggc tree FRE : 0.32 ( 1%) usr 0.03 ( 1%) sys 0.33 ( 1%) wall 3726 kB ( 1%) ggc tree code sinking : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 78 kB ( 0%) ggc tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 63 kB ( 0%) ggc tree forward propagate: 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall 1127 kB ( 0%) ggc tree conservative DCE : 0.49 ( 1%) usr 0.00 ( 0%) sys 0.52 ( 1%) wall 0 kB ( 0%) ggc tree aggressive DCE : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc tree DSE : 0.28 ( 1%) usr 0.00 ( 0%) sys 0.29 ( 1%) wall 28 kB ( 0%) ggc tree loop bounds : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 47 kB ( 0%) ggc scev constant prop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 18 kB ( 0%) ggc tree iv optimization : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 194 kB ( 0%) ggc tree loop init : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc tree copy headers : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 190 kB ( 0%) ggc tree SSA uncprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc tree SSA to normal : 0.18 ( 0%) usr 0.05 ( 2%) sys 0.29 ( 1%) wall 2239 kB ( 0%) ggc tree rename SSA copies: 0.14 ( 0%) usr 0.08 ( 3%) sys 0.18 ( 0%) wall 2 kB ( 0%) ggc control dependences : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc expand : 1.41 ( 3%) usr 0.05 ( 2%) sys 1.52 ( 3%) wall 27273 kB ( 5%) ggc varconst : 0.06 ( 0%) usr 0.05 ( 2%) sys 0.10 ( 0%) wall 488 kB ( 0%) ggc jump : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 959 kB ( 0%) ggc CSE : 5.51 (11%) usr 0.01 ( 0%) sys 5.39 (10%) wall 844 kB ( 0%) ggc loop analysis : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 776 kB ( 0%) ggc global CSE : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc CPROP 1 : 0.16 ( 0%) usr 0.01 ( 0%) sys 0.08 ( 0%) wall 1463 kB ( 0%) ggc PRE : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall 536 kB ( 0%) ggc CPROP 2 : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 191 kB ( 0%) ggc bypass jumps : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.14 ( 0%) wall 195 kB ( 0%) ggc CSE 2 : 3.13 ( 6%) usr 0.00 ( 0%) sys 3.22 ( 6%) wall 423 kB ( 0%) ggc branch prediction : 0.19 ( 0%) usr 0.01 ( 0%) sys 0.13 ( 0%) wall 375 kB ( 0%) ggc flow analysis : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc combiner : 0.39 ( 1%) usr 0.00 ( 0%) sys 0.41 ( 1%) wall 1802 kB ( 0%) ggc if-conversion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 36 kB ( 0%) ggc regmove : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 82 kB ( 0%) ggc local alloc : 0.39 ( 1%) usr 0.01 ( 0%) sys 0.37 ( 1%) wall 778 kB ( 0%) ggc global alloc : 0.87 ( 2%) usr 0.00 ( 0%) sys 0.90 ( 2%) wall 2306 kB ( 0%) ggc reload CSE regs : 2.34 ( 5%) usr 0.00 ( 0%) sys 2.47 ( 5%) wall 2413 kB ( 0%) ggc flow 2 : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 706 kB ( 0%) ggc if-conversion 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc peephole 2 : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.12 ( 0%) wall 1729 kB ( 0%) ggc rename registers : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall 13 kB ( 0%) ggc scheduling 2 : 1.33 ( 3%) usr 0.01 ( 0%) sys 1.35 ( 3%) wall 10690 kB ( 2%) ggc machine dep reorg : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall 13 kB ( 0%) ggc reorder blocks : 0.06 ( 0%) usr 0.01 ( 0%) sys 0.06 ( 0%) wall 522 kB ( 0%) ggc final : 0.35 ( 1%) usr 0.00 ( 0%) sys 0.41 ( 1%) wall 1457 kB ( 0%) ggc symout : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 173 kB ( 0%) ggc TOTAL : 49.18 2.74 52.57 573794 kB With gc memory use the worst offenders are tree SSA rewrite and DOM. This program no longer crashes cc1plus. I propose to close this PR. I don't see anything obvious left to do that could be justified for 4.1. Subject: Re: [3.4/4.0/4.1 Regression] g++ crash with
-O2 and -O3 on input file
On Mon, 2005-10-17 at 16:34 +0000, dnovillo at gcc dot gnu dot org
wrote:
>
> ------- Comment #50 from dnovillo at gcc dot gnu dot org 2005-10-17 16:33 -------
>
> This program no longer crashes cc1plus. I propose to close this PR. I don't
> see anything obvious left to do that could be justified for 4.1.
You might consider opening a 4.2 PR since there are definitely
things that could be further improved from a compile-time standpoint
for the code referenced in this PR.
jeff
Mark, there's probably not much else we can do in this PR for 4.1. What I see in the aliasing times involves quite a few changes, most of them from the aliasing branch and some other similarly intrusive changes I've got on the side. Other than that, the timings are now relatively reasonable. How do you want to handle this? Subject: Re: [3.4/4.0/4.1 Regression] g++ crash with
-O2 and -O3 on input file
dnovillo at gcc dot gnu dot org wrote:
> ------- Comment #52 from dnovillo at gcc dot gnu dot org 2005-10-17 17:32 -------
>
> Mark, there's probably not much else we can do in this PR for 4.1. What I see
> in the aliasing times involves quite a few changes, most of them from the
> aliasing branch and some other similarly intrusive changes I've got on the
> side.
>
> Other than that, the timings are now relatively reasonable. How do you want to
> handle this?
The original report was about crashing. Now we don't. So, clearly,
this PR should be closed.
If you think that we should still be able to do better in terms of
compile-time performance or memory usage, please open a new PR, targeted
at 4.2.
Thanks,
Closing. As discussed earlier, for all intents and purposes this is fixed. |