Bug 36291 - [4.3/4.4 Regression] GCC is slow and memory-hungry building sipQtGuipart.cpp
Summary: [4.3/4.4 Regression] GCC is slow and memory-hungry building sipQtGuipart.cpp
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.3.1
: P3 normal
Target Milestone: 4.3.2
Assignee: Richard Biener
URL:
Keywords: compile-time-hog, memory-hog
Depends on: 33237
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-21 16:07 UTC by Richard Biener
Modified: 2008-06-06 20:26 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.1.3
Known to fail: 4.3.1
Last reconfirmed: 2008-05-23 14:19:09


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2008-05-21 16:07:18 UTC
Testcase is from PR30052:

http://gcc.gnu.org/bugzilla/attachment.cgi?id=13678

current GCC 4.3 branch gives us a peak memory usage of 1.8GB on x86_64 and

 tree find ref. vars   :  15.36 ( 5%) usr   0.80 ( 8%) sys  15.90 ( 5%) wall 
817801 kB (35%) ggc
 tree alias analysis   :  16.27 ( 5%) usr   0.38 ( 4%) sys  16.11 ( 5%) wall  
11037 kB ( 0%) ggc
 tree call clobbering  :  41.35 (14%) usr   0.28 ( 3%) sys  43.00 (14%) wall   
3132 kB ( 0%) ggc
 tree flow insensitive alias:  31.26 (10%) usr   0.34 ( 3%) sys  31.94 (10%)
wall       0 kB ( 0%) ggc
 tree memory partitioning:  83.32 (28%) usr   0.89 ( 9%) sys  84.36 (27%) wall 
   974 kB ( 0%) ggc
 tree SSA incremental  :  10.60 ( 4%) usr   0.17 ( 2%) sys  11.11 ( 4%) wall  
15755 kB ( 1%) ggc
 tree operand scan     :  28.71 (10%) usr   0.57 ( 6%) sys  29.59 ( 9%) wall 
160271 kB ( 7%) ggc
 TOTAL                 : 301.74             9.75           314.50            2354835 kB
Comment 1 Richard Biener 2008-05-22 12:28:08 UTC
One slow "leaker" of memory for large TUs is the operands_bitmap_obstack from
the operand scanner.  From it the stmt annotation loaded and stored symbols
bitmaps are allocated but never freed until after the last function has gone
out of SSA.

We should consider moving these to GC memory or to add support for using alloc
pools for bitmap allocations.


Memory partitioning uses loads of bitmaps for the parent tags:
tree-ssa-alias.c:1311 (update_reference_c 5424467  433957360  433957360  433957360   76199616


Likewise PTA still does that:
tree-ssa-structalias.c:4850 (find_what_p_  62908  555059880   42045880   42032640      62415


Overall the variable annotations account for most of the GC memory used:
tree-ssanames.c:146 (make_ssa_name_fn)             33297312: 2.1%          0: 0.0%          0: 0.0%          0: 0.0%     346847
bitmap.c:229 (bitmap_element_allocate)            116709320: 7.4%          0: 0.0%          0: 0.0%   16672760:11.7%    2084095
ggc-common.c:179 (ggc_calloc)                     409212224:25.8%   12684056: 5.0%    1974632: 2.2%    4572784: 3.2%     194426
tree-dfa.c:153 (create_var_ann)                   482008296:30.4%          0: 0.0%          0: 0.0%   43818936:30.6%    5477367
Total                                            1586488294        255567175         90963544        143104669         18411405
Comment 2 Richard Biener 2008-05-22 12:35:35 UTC
On the trunk things look different:

 tree find ref. vars   :  22.59 (13%) usr   1.08 (13%) sys  24.42 (14%) wall  815888 kB (47%) ggc
 tree PTA              :   1.43 ( 1%) usr   0.08 ( 1%) sys   1.39 ( 1%) wall   19446 kB ( 1%) ggc
 tree alias analysis   :   5.64 ( 3%) usr   0.15 ( 2%) sys   5.71 ( 3%) wall    1611 kB ( 0%) ggc
 tree call clobbering  :  15.15 ( 9%) usr   0.07 ( 1%) sys  15.56 ( 9%) wall     613 kB ( 0%) ggc
 tree flow sensitive alias:   1.85 ( 1%) usr   0.09 ( 1%) sys   2.06 ( 1%) wall   96898 kB ( 6%) ggc
 tree flow insensitive alias:   1.79 ( 1%) usr   0.00 ( 0%) sys   1.68 ( 1%) wall       0 kB ( 0%) ggc
 tree memory partitioning:  38.40 (23%) usr   0.53 ( 7%) sys  38.68 (22%) wall     820 kB ( 0%) ggc
 tree operand scan     :  11.31 ( 7%) usr   0.16 ( 2%) sys  11.76 ( 7%) wall   58545 kB ( 3%) ggc
 TOTAL                 : 169.77             8.02           177.99            1747899 kB

but the memory situation isn't different.
Comment 3 Richard Biener 2008-05-22 12:48:26 UTC
The root of all evil is the following code in add_referenced_var():

      /* Scan DECL_INITIAL for pointer variables as they may contain
         address arithmetic referencing the address of other
         variables.
         Even non-constant intializers need to be walked, because
         IPA passes might prove that their are invariant later on.  */
      if (DECL_INITIAL (var)
          /* Initializers of external variables are not useful to the
             optimizers.  */
          && !DECL_EXTERNAL (var))
        walk_tree (&DECL_INITIAL (var), find_vars_r, NULL, 0);

this causes us to basically add all globals to all functions referenced
vars once they reference one of the chained structs.

We shouldn't be doing this but instead who needs those vars should add them.

I suppose the IPA passes thing is just the lack of a global DECL_UID to
tree mapping.  So I am going to try to change the above to

      /* Scan DECL_INITIAL for pointer variables as they may contain
         address arithmetic referencing the address of other
         variables.  As we are only interested in directly referenced
         globals or referenced locals restrict this to initializers
         than can refer to local variables.  */
      if (DECL_INITIAL (var)
          && DECL_CONTEXT (var) == current_function_decl)
        walk_tree (&DECL_INITIAL (var), find_vars_r, NULL, 0);


This gets memory usage down to about 700MB and compile time down to 50s.
Comment 4 Richard Biener 2008-05-22 14:39:24 UTC
I added gcc.c-torture/execute/20080522-1.c which points at two problems.

First we need to add referenced vars as they come (there is already
find_new_referenced_vars and some users, tree-ssa-ccp.c:get_symbol_constant_value
needs to do it as well).

Second we need to update alias information.  This turns out to be a hard
problem.  For the testcase we need to add 'i' to the symbols SMT.7 aliases
and update all statements that reference SMT.7 - which is not easily possible
and expensive.  We probably also need to update flow-sensitive alias info
which is even harder if not impossible.

So it looks like this is a dead end sofar, but still the root of the
problem remains.
Comment 5 Richard Biener 2008-05-22 19:15:26 UTC
As noted in comment #1 variable annotations are a major problem (they are duplicated for global variables, for each function the variable is referenced from).

It happens that sharing variable annotations for globals between functions
reduces peak memory usage by 1GB.  So that's were I'm currently looking at.
Comment 6 Richard Biener 2008-05-22 19:22:56 UTC
That is, var annotations back to sanity:

tree-dfa.c:150 (create_var_ann)                      206016: 0.0%   15094400: 3.2%     142592: 0.1%          0: 0.0%     241297

compared to originally

tree-dfa.c:153 (create_var_ann)                   482008296:30.4%          0: 0.0%          0: 0.0%   43818936:30.6%    5477367
Comment 7 Richard Biener 2008-05-23 14:16:07 UTC
Ok, apart from the var annotations I give up here as far as 4.3 is concerned.
We cannot really fix the compile-time problem without skewing the heuristics
and risking fallout through that.

We already know from PR33237 that the algorithmic problem is here:

        -: 1252:update_reference_counts (struct mem_ref_stats_d *mem_ref_stats)
     6274: 1253:{
...
    62908: 1296:          if (MTAG_ALIASES (tag))
 76443361: 1297:            EXECUTE_IF_SET_IN_BITMAP (MTAG_ALIASES (tag), 0, j, bj)
        -: 1298:              {
...

due to the high number of referenced vars the aliases tend to be a lot - and
we visit them multiple times as well (but they are not easy to combine).

For trunk the idea could be to exempt call clobbered vars from the
partitioning heuristics completely and simply partition them into a single
partition up-front.  (They all end up in the same partition anyway, but in
some cases are of course not partitioned at all - which is where we would
clearly loose without the alias oracle as a fallback)

Micro-optimizing for this testcase is possible, but several attempts only
result in very minor improvements.
Comment 8 Jan Hubicka 2008-05-26 14:20:45 UTC
Subject: Re:  GCC is slow and memory-hungry building sipQtGuipart.cpp

> As noted in comment #1 variable annotations are a major problem (they are
> duplicated for global variables, for each function the variable is referenced
> from).
> 
> It happens that sharing variable annotations for globals between functions
> reduces peak memory usage by 1GB.  So that's were I'm currently looking at.

The problem of var annotations is that they contain a lot of local stuff
+ little of stuff that is specific per function (that is aliasing info,
for instance).
Breaking out the local stuff and allocating the global stuff only where
needed is way I wanted to go for a while, but we never got actually into
agreement how to get there.

Honza
> 
> 
> -- 
> 
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36291
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
Comment 9 Richard Biener 2008-05-28 13:54:52 UTC
Subject: Bug 36291

Author: rguenth
Date: Wed May 28 13:54:05 2008
New Revision: 136095

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136095
Log:
2008-05-28  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/36291
	* tree-flow. h (struct gimple_df): Remove var_anns member.
	* tree-flow-inline.h (gimple_var_anns): Remove.
	(var_ann): Simplify.
	* tree-dfa.c (create_var_ann): Simplify.
	(remove_referenced_var): Clear alias info from var_anns of globals.
	* tree-ssa.c (init_tree_ssa): Do not allocate var_anns.
	(delete_tree_ssa): Clear alias info from var_anns of globals.
	Do not free var_anns.
	(var_ann_eq): Remove.
	(var_ann_hash): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-dfa.c
    trunk/gcc/tree-flow-inline.h
    trunk/gcc/tree-flow.h
    trunk/gcc/tree-ssa.c

Comment 10 Richard Biener 2008-05-28 13:57:06 UTC
The situation on the trunk should be much better now.  A trivial backport to the
4.3 branch failed during bootstrap though, so that has to wait for some
investigation.
Comment 11 Richard Biener 2008-06-05 14:40:47 UTC
I have a working backport for 4.3.2 that get's memory usage down.
Comment 12 Richard Biener 2008-06-06 20:13:13 UTC
Subject: Bug 36291

Author: rguenth
Date: Fri Jun  6 20:12:27 2008
New Revision: 136502

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=136502
Log:
2008-06-06  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/36291
	* tree-flow. h (struct gimple_df): Remove var_anns member.
	* tree-flow-inline.h (gimple_var_anns): Remove.
	(var_ann): Simplify.
	* tree-dfa.c (create_var_ann): Simplify.
	(remove_referenced_var): Clear alias info from var_anns of globals.
	* tree-ssa.c (init_tree_ssa): Do not allocate var_anns.
	(delete_tree_ssa): Clear alias info from var_anns of globals.
	Do not free var_anns.
	(var_ann_eq): Remove.
	(var_ann_hash): Likewise.

Modified:
    branches/gcc-4_3-branch/gcc/ChangeLog
    branches/gcc-4_3-branch/gcc/tree-dfa.c
    branches/gcc-4_3-branch/gcc/tree-flow-inline.h
    branches/gcc-4_3-branch/gcc/tree-flow.h
    branches/gcc-4_3-branch/gcc/tree-ssa.c

Comment 13 Richard Biener 2008-06-06 20:26:18 UTC
Fixed.  The remaining slowness is a dup of PR33237.

Memory usage on the branch is now down to ~700MB peak VM usage on a 3GB machine
at -O on x86_64.  Compile time is down to

 tree find ref. vars   :  12.26 (10%) usr   0.27 ( 4%) sys  12.44 (10%) wall  199898 kB (18%) ggc
 tree alias analysis   :   5.04 ( 4%) usr   0.10 ( 2%) sys   5.15 ( 4%) wall    1716 kB ( 0%) ggc
 tree call clobbering  :  12.50 (10%) usr   0.05 ( 1%) sys  12.54 (10%) wall     592 kB ( 0%) ggc
 tree memory partitioning:  32.74 (26%) usr   0.23 ( 4%) sys  32.94 (25%) wall     880 kB ( 0%) ggc
 tree SSA incremental  :   3.29 ( 3%) usr   0.04 ( 1%) sys   3.25 ( 3%) wall    3714 kB ( 0%) ggc
 tree operand scan     :   8.62 ( 7%) usr   0.17 ( 3%) sys   8.59 ( 7%) wall   59455 kB ( 5%) ggc
 TOTAL                 : 121.52             6.89           128.51            1141231 kB

of course 4.1 took 34s and only 600MB.
Comment 14 Richard Biener 2009-04-08 16:33:32 UTC
Subject: Bug 36291

Author: rguenth
Date: Wed Apr  8 16:33:08 2009
New Revision: 145757

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=145757
Log:
2009-04-08  Richard Guenther  <rguenther@suse.de>

	PR middle-end/36291
	* tree-dfa.c (add_referenced_var): Do not recurse into
	global initializers.
	* tree-ssa-ccp.c (get_symbol_constant_value): Add newly
	exposed variables.
	(fold_const_aggregate_ref): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-dfa.c
    trunk/gcc/tree-ssa-ccp.c