Summary: | [4.3/4.4 Regression] GCC is slow and memory-hungry building sipQtGuipart.cpp | ||
---|---|---|---|
Product: | gcc | Reporter: | Richard Biener <rguenth> |
Component: | tree-optimization | Assignee: | Richard Biener <rguenth> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | gcc-bugs, hubicka, pawel_sikora |
Priority: | P3 | Keywords: | compile-time-hog, memory-hog |
Version: | 4.3.1 | ||
Target Milestone: | 4.3.2 | ||
Host: | Target: | ||
Build: | Known to work: | 4.1.3 | |
Known to fail: | 4.3.1 | Last reconfirmed: | 2008-05-23 14:19:09 |
Bug Depends on: | 33237 | ||
Bug Blocks: |
Description
Richard Biener
2008-05-21 16:07:18 UTC
One slow "leaker" of memory for large TUs is the operands_bitmap_obstack from the operand scanner. From it the stmt annotation loaded and stored symbols bitmaps are allocated but never freed until after the last function has gone out of SSA. We should consider moving these to GC memory or to add support for using alloc pools for bitmap allocations. Memory partitioning uses loads of bitmaps for the parent tags: tree-ssa-alias.c:1311 (update_reference_c 5424467 433957360 433957360 433957360 76199616 Likewise PTA still does that: tree-ssa-structalias.c:4850 (find_what_p_ 62908 555059880 42045880 42032640 62415 Overall the variable annotations account for most of the GC memory used: tree-ssanames.c:146 (make_ssa_name_fn) 33297312: 2.1% 0: 0.0% 0: 0.0% 0: 0.0% 346847 bitmap.c:229 (bitmap_element_allocate) 116709320: 7.4% 0: 0.0% 0: 0.0% 16672760:11.7% 2084095 ggc-common.c:179 (ggc_calloc) 409212224:25.8% 12684056: 5.0% 1974632: 2.2% 4572784: 3.2% 194426 tree-dfa.c:153 (create_var_ann) 482008296:30.4% 0: 0.0% 0: 0.0% 43818936:30.6% 5477367 Total 1586488294 255567175 90963544 143104669 18411405 On the trunk things look different: tree find ref. vars : 22.59 (13%) usr 1.08 (13%) sys 24.42 (14%) wall 815888 kB (47%) ggc tree PTA : 1.43 ( 1%) usr 0.08 ( 1%) sys 1.39 ( 1%) wall 19446 kB ( 1%) ggc tree alias analysis : 5.64 ( 3%) usr 0.15 ( 2%) sys 5.71 ( 3%) wall 1611 kB ( 0%) ggc tree call clobbering : 15.15 ( 9%) usr 0.07 ( 1%) sys 15.56 ( 9%) wall 613 kB ( 0%) ggc tree flow sensitive alias: 1.85 ( 1%) usr 0.09 ( 1%) sys 2.06 ( 1%) wall 96898 kB ( 6%) ggc tree flow insensitive alias: 1.79 ( 1%) usr 0.00 ( 0%) sys 1.68 ( 1%) wall 0 kB ( 0%) ggc tree memory partitioning: 38.40 (23%) usr 0.53 ( 7%) sys 38.68 (22%) wall 820 kB ( 0%) ggc tree operand scan : 11.31 ( 7%) usr 0.16 ( 2%) sys 11.76 ( 7%) wall 58545 kB ( 3%) ggc TOTAL : 169.77 8.02 177.99 1747899 kB but the memory situation isn't different. The root of all evil is the following code in add_referenced_var(): /* Scan DECL_INITIAL for pointer variables as they may contain address arithmetic referencing the address of other variables. Even non-constant intializers need to be walked, because IPA passes might prove that their are invariant later on. */ if (DECL_INITIAL (var) /* Initializers of external variables are not useful to the optimizers. */ && !DECL_EXTERNAL (var)) walk_tree (&DECL_INITIAL (var), find_vars_r, NULL, 0); this causes us to basically add all globals to all functions referenced vars once they reference one of the chained structs. We shouldn't be doing this but instead who needs those vars should add them. I suppose the IPA passes thing is just the lack of a global DECL_UID to tree mapping. So I am going to try to change the above to /* Scan DECL_INITIAL for pointer variables as they may contain address arithmetic referencing the address of other variables. As we are only interested in directly referenced globals or referenced locals restrict this to initializers than can refer to local variables. */ if (DECL_INITIAL (var) && DECL_CONTEXT (var) == current_function_decl) walk_tree (&DECL_INITIAL (var), find_vars_r, NULL, 0); This gets memory usage down to about 700MB and compile time down to 50s. I added gcc.c-torture/execute/20080522-1.c which points at two problems. First we need to add referenced vars as they come (there is already find_new_referenced_vars and some users, tree-ssa-ccp.c:get_symbol_constant_value needs to do it as well). Second we need to update alias information. This turns out to be a hard problem. For the testcase we need to add 'i' to the symbols SMT.7 aliases and update all statements that reference SMT.7 - which is not easily possible and expensive. We probably also need to update flow-sensitive alias info which is even harder if not impossible. So it looks like this is a dead end sofar, but still the root of the problem remains. As noted in comment #1 variable annotations are a major problem (they are duplicated for global variables, for each function the variable is referenced from). It happens that sharing variable annotations for globals between functions reduces peak memory usage by 1GB. So that's were I'm currently looking at. That is, var annotations back to sanity: tree-dfa.c:150 (create_var_ann) 206016: 0.0% 15094400: 3.2% 142592: 0.1% 0: 0.0% 241297 compared to originally tree-dfa.c:153 (create_var_ann) 482008296:30.4% 0: 0.0% 0: 0.0% 43818936:30.6% 5477367 Ok, apart from the var annotations I give up here as far as 4.3 is concerned. We cannot really fix the compile-time problem without skewing the heuristics and risking fallout through that. We already know from PR33237 that the algorithmic problem is here: -: 1252:update_reference_counts (struct mem_ref_stats_d *mem_ref_stats) 6274: 1253:{ ... 62908: 1296: if (MTAG_ALIASES (tag)) 76443361: 1297: EXECUTE_IF_SET_IN_BITMAP (MTAG_ALIASES (tag), 0, j, bj) -: 1298: { ... due to the high number of referenced vars the aliases tend to be a lot - and we visit them multiple times as well (but they are not easy to combine). For trunk the idea could be to exempt call clobbered vars from the partitioning heuristics completely and simply partition them into a single partition up-front. (They all end up in the same partition anyway, but in some cases are of course not partitioned at all - which is where we would clearly loose without the alias oracle as a fallback) Micro-optimizing for this testcase is possible, but several attempts only result in very minor improvements. Subject: Re: GCC is slow and memory-hungry building sipQtGuipart.cpp > As noted in comment #1 variable annotations are a major problem (they are > duplicated for global variables, for each function the variable is referenced > from). > > It happens that sharing variable annotations for globals between functions > reduces peak memory usage by 1GB. So that's were I'm currently looking at. The problem of var annotations is that they contain a lot of local stuff + little of stuff that is specific per function (that is aliasing info, for instance). Breaking out the local stuff and allocating the global stuff only where needed is way I wanted to go for a while, but we never got actually into agreement how to get there. Honza > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36291 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. Subject: Bug 36291 Author: rguenth Date: Wed May 28 13:54:05 2008 New Revision: 136095 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136095 Log: 2008-05-28 Richard Guenther <rguenther@suse.de> PR tree-optimization/36291 * tree-flow. h (struct gimple_df): Remove var_anns member. * tree-flow-inline.h (gimple_var_anns): Remove. (var_ann): Simplify. * tree-dfa.c (create_var_ann): Simplify. (remove_referenced_var): Clear alias info from var_anns of globals. * tree-ssa.c (init_tree_ssa): Do not allocate var_anns. (delete_tree_ssa): Clear alias info from var_anns of globals. Do not free var_anns. (var_ann_eq): Remove. (var_ann_hash): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-dfa.c trunk/gcc/tree-flow-inline.h trunk/gcc/tree-flow.h trunk/gcc/tree-ssa.c The situation on the trunk should be much better now. A trivial backport to the 4.3 branch failed during bootstrap though, so that has to wait for some investigation. I have a working backport for 4.3.2 that get's memory usage down. Subject: Bug 36291 Author: rguenth Date: Fri Jun 6 20:12:27 2008 New Revision: 136502 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136502 Log: 2008-06-06 Richard Guenther <rguenther@suse.de> PR tree-optimization/36291 * tree-flow. h (struct gimple_df): Remove var_anns member. * tree-flow-inline.h (gimple_var_anns): Remove. (var_ann): Simplify. * tree-dfa.c (create_var_ann): Simplify. (remove_referenced_var): Clear alias info from var_anns of globals. * tree-ssa.c (init_tree_ssa): Do not allocate var_anns. (delete_tree_ssa): Clear alias info from var_anns of globals. Do not free var_anns. (var_ann_eq): Remove. (var_ann_hash): Likewise. Modified: branches/gcc-4_3-branch/gcc/ChangeLog branches/gcc-4_3-branch/gcc/tree-dfa.c branches/gcc-4_3-branch/gcc/tree-flow-inline.h branches/gcc-4_3-branch/gcc/tree-flow.h branches/gcc-4_3-branch/gcc/tree-ssa.c Fixed. The remaining slowness is a dup of PR33237. Memory usage on the branch is now down to ~700MB peak VM usage on a 3GB machine at -O on x86_64. Compile time is down to tree find ref. vars : 12.26 (10%) usr 0.27 ( 4%) sys 12.44 (10%) wall 199898 kB (18%) ggc tree alias analysis : 5.04 ( 4%) usr 0.10 ( 2%) sys 5.15 ( 4%) wall 1716 kB ( 0%) ggc tree call clobbering : 12.50 (10%) usr 0.05 ( 1%) sys 12.54 (10%) wall 592 kB ( 0%) ggc tree memory partitioning: 32.74 (26%) usr 0.23 ( 4%) sys 32.94 (25%) wall 880 kB ( 0%) ggc tree SSA incremental : 3.29 ( 3%) usr 0.04 ( 1%) sys 3.25 ( 3%) wall 3714 kB ( 0%) ggc tree operand scan : 8.62 ( 7%) usr 0.17 ( 3%) sys 8.59 ( 7%) wall 59455 kB ( 5%) ggc TOTAL : 121.52 6.89 128.51 1141231 kB of course 4.1 took 34s and only 600MB. Subject: Bug 36291 Author: rguenth Date: Wed Apr 8 16:33:08 2009 New Revision: 145757 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=145757 Log: 2009-04-08 Richard Guenther <rguenther@suse.de> PR middle-end/36291 * tree-dfa.c (add_referenced_var): Do not recurse into global initializers. * tree-ssa-ccp.c (get_symbol_constant_value): Add newly exposed variables. (fold_const_aggregate_ref): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-dfa.c trunk/gcc/tree-ssa-ccp.c |