Bug 32723

Summary: [4.2 Regression] memory hog in solve_graph
Product: gcc Reporter: Pascal "Pixel" Rigaux <pixel>
Component: tree-optimizationAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: dberlin, fang, gcc-bugs, rguenth
Priority: P2 Keywords: alias, memory-hog
Version: 4.2.1   
Target Milestone: 4.3.0   
Host: i586-mandriva-linux-gnu Target: i586-mandriva-linux-gnu
Build: i586-mandriva-linux-gnu Known to work: 4.1.2 4.3.0
Known to fail: 4.2.1 4.2.3 4.2.4 Last reconfirmed: 2007-07-16 09:28:11
Attachments: memory hog test case

Description Pascal "Pixel" Rigaux 2007-07-10 18:11:39 UTC
with the attached test case, solve_graph makes memory consumptation goes higher than 1GB, whereas it compiles fine with <50MB on gcc 4.1 and 4.3

the test case was created from tomoe-dict-unihan.c
(https://sourceforge.jp/projects/tomoe/) which uses 11MB .h data file
Comment 1 Pascal "Pixel" Rigaux 2007-07-10 18:16:18 UTC
Created attachment 13882 [details]
memory hog test case
Comment 2 Andrew Pinski 2007-07-10 20:22:55 UTC
What exact version of 4.2.1 are you using?
Comment 3 Pascal "Pixel" Rigaux 2007-07-10 22:21:21 UTC
tested with rc1 and svn
Comment 4 Pascal "Pixel" Rigaux 2007-07-11 10:23:03 UTC
i forgot to say it doesn't occur without -O, and occurs with -O, -O2

/usr/lib/gcc/i586-mandriva-linux-gnu/4.2.1/cc1 -O fail.c
 _create
Analyzing compilation unitPerforming interprocedural optimizations
Assembling functions:
 _create
Execution times (seconds)
 callgraph construction:   0.14 ( 1%) usr   0.01 ( 0%) sys   0.14 ( 0%) wall       0 kB ( 0%) ggc
 callgraph optimization:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ipa reference         :   0.07 ( 0%) usr   0.02 ( 0%) sys   0.10 ( 0%) wall     428 kB ( 1%) ggc
 preprocessing         :   0.27 ( 1%) usr   0.23 ( 2%) sys   0.68 ( 2%) wall    8293 kB (25%) ggc
 lexical analysis      :   0.13 ( 1%) usr   0.60 ( 4%) sys   0.84 ( 3%) wall       0 kB ( 0%) ggc
 parser                :   0.64 ( 3%) usr   0.43 ( 3%) sys   0.84 ( 3%) wall   18586 kB (57%) ggc
 tree find ref. vars   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    1905 kB ( 6%) ggc
 tree PTA              :  15.92 (86%) usr  10.73 (72%) sys  26.67 (80%) wall       5 kB ( 0%) ggc
 tree alias analysis   :   0.91 ( 5%) usr   2.77 (19%) sys   3.65 (11%) wall       0 kB ( 0%) ggc
 tree SSA incremental  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 tree SRA              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   0.10 ( 1%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall       7 kB ( 0%) ggc
 varconst              :   0.05 ( 0%) usr   0.02 ( 0%) sys   0.01 ( 0%) wall     643 kB ( 2%) ggc
 global alloc          :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 symout                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 :  18.57            14.84            33.41              32596 kB



ps: the verbose output is a little garbled, this trivial patch on  branches/gcc-4_2-branch fixes it:

--- gcc/cgraphunit.c    (revision 126511)
+++ gcc/cgraphunit.c    (working copy)
@@ -1544,7 +1544,7 @@
 
   timevar_push (TV_CGRAPHOPT);
   if (!quiet_flag)
-    fprintf (stderr, "Performing interprocedural optimizations\n");
+    fprintf (stderr, "\nPerforming interprocedural optimizations\n");
 
   cgraph_function_and_variable_visibility ();
   if (cgraph_dump_file)
Comment 5 Richard Biener 2007-07-16 09:12:19 UTC
/tmp> ~/bin/maxmem2.sh gcc-4.1 -S -O2 -o /dev/null fail.c 
total: 96228 kB
/tmp> ~/bin/maxmem2.sh gcc-4.2 -S -O2 -o /dev/null fail.c 
total: 1579668 kB

trunk:
/tmp> ~/bin/maxmem2.sh /space/rguenther/tramp3d/install/bin/gcc -S -O2 -o /dev/null fail.c 
total: 109731 kB

meh.  Danny, do you remember which change on the trunk could have improved this?

Maybe http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=122741?
Comment 6 Richard Biener 2007-07-16 09:28:11 UTC
Backporting the shared bitmap changes brings us back to 76MB max. memory usage
for 4.2.  I'll bootstrap & test this.
Comment 7 Daniel Berlin 2007-07-24 07:16:43 UTC
Didn't you commit the shared bitmap fix?
Comment 8 Richard Biener 2007-07-24 07:30:59 UTC
Subject: Bug 32723

Author: rguenth
Date: Tue Jul 24 07:30:47 2007
New Revision: 126867

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=126867
Log:
2007-07-24  Richard Guenther  <rguenther@suse.de>

	PR tree-optimization/32723
	Backport from mainline:
	2007-03-09  Daniel Berlin  <dberlin@dberlin.org>

        * tree-ssa-structalias.c (shared_bitmap_info_t): New structure.
        (shared_bitmap_table): New variable.
        (shared_bitmap_hash): New function.
        (shared_bitmap_eq): Ditto
        (shared_bitmap_lookup): Ditto.
        (shared_bitmap_add): Ditto.
        (find_what_p_points_to): Rewrite to use shared bitmap hashtable.
        (init_alias_vars): Init shared bitmap hashtable.
        (delete_points_to_sets): Delete shared bitmap hashtable.

Modified:
    branches/gcc-4_2-branch/gcc/ChangeLog
    branches/gcc-4_2-branch/gcc/tree-ssa-structalias.c

Comment 9 Richard Biener 2007-07-24 07:31:18 UTC
Fixed.
Comment 10 Pascal "Pixel" Rigaux 2007-08-11 17:08:28 UTC
are you sure it fixes it? it still takes 1G here...
Comment 11 Daniel Berlin 2007-08-20 01:56:15 UTC
Uh, it doesn't take 1 gig on either 4.2 or 4.3
Comment 12 Pascal "Pixel" Rigaux 2007-08-21 13:29:30 UTC
i do know it works nicely with gcc 4.3

but i still get the "memory hog" behaviour using branches/gcc-4_2-branch, ie:

% /usr/lib/gcc/i586-mandriva-linux-gnu/4.2.1/cc1 -O2 fail.c

runs with memory RSS raising up to 1G many times.

i've also tried with gcc-4.2-4.2.1-4 from debian (which has a SVN snapshot from 20070812):

% ulimit -v 900000
% /usr/lib/gcc/i486-linux-gnu/4.2.1/cc1 fail.c -O2
 _create
Analyzing compilation unitPerforming interprocedural optimizations
Assembling functions:
 _create
cc1: out of memory allocating 4064 bytes after a total of 877277184 bytes
Comment 13 Mark Mitchell 2007-09-05 00:58:42 UTC
Do we have any way to work out whether this is still a problem?  Richard seems to think the bug has been fixed, but Pascal is still seeing the problem, apparently.
Comment 14 Mark Mitchell 2007-10-09 19:21:31 UTC
Change target milestone to 4.2.3, as 4.2.2 has been released.
Comment 15 Richard Biener 2007-10-31 13:07:56 UTC
The memory is temporarily needed now by solve_graph(), because the graph has
48902 nodes.  On the mainline we have only 3 constraints while for 4.2 we have
thousands:

ANYTHING = &ANYTHING
READONLY = &ANYTHING
INTEGER = &ANYTHING
ESCAPED_VARS = *ESCAPED_VARS
NONLOCAL.6 = ESCAPED_VARS
ESCAPED_VARS = &NONLOCAL.6
ESCAPED_VARS = &NONLOCAL.6
infos = ESCAPED_VARS
c_20089 = ESCAPED_VARS
ESCAPED_VARS = &c_20089
c_20089 = &ANYTHING
c_20089 = &ANYTHING
ESCAPED_VARS = &c_20089.val
c_20089.val = ESCAPED_VARS
infos = &c_20089
infos = &c_20089.val
c_200A2 = ESCAPED_VARS
ESCAPED_VARS = &c_200A2
...

the mainline looks like:

ANYTHING = { ANYTHING }
READONLY = { ANYTHING }
INTEGER = { ANYTHING }
D.28988 = same as infos
D.28988.c = same as infos
D.28988.b = same as infos
infos = { ANYTHING }


The shared bitmap stuff was not dominant for this testcase.  Still I doubt
we can backport all of the solver changes.  Also quite possibly 4.3 benefits
from early optimizations simplifying the problem to solve.
Comment 16 Daniel Berlin 2007-10-31 14:22:22 UTC
Subject: Re:  [4.2 Regression] memory hog in solve_graph

On 31 Oct 2007 13:07:57 -0000, rguenth at gcc dot gnu dot org
<gcc-bugzilla@gcc.gnu.org> wrote:
>
>
> ------- Comment #15 from rguenth at gcc dot gnu dot org  2007-10-31 13:07 -------
> The memory is temporarily needed now by solve_graph(), because the graph has
> 48902 nodes.

48902 nodes is not a lot for the solver, to be honest.

>  On the mainline we have only 3 constraints while for 4.2 we have
> thousands:
>
> ANYTHING = &ANYTHING
> READONLY = &ANYTHING
> INTEGER = &ANYTHING
> ESCAPED_VARS = *ESCAPED_VARS
> NONLOCAL.6 = ESCAPED_VARS
> ESCAPED_VARS = &NONLOCAL.6
> ESCAPED_VARS = &NONLOCAL.6
> infos = ESCAPED_VARS
> c_20089 = ESCAPED_VARS
> ESCAPED_VARS = &c_20089
> c_20089 = &ANYTHING
> c_20089 = &ANYTHING
> ESCAPED_VARS = &c_20089.val
> c_20089.val = ESCAPED_VARS
> infos = &c_20089
> infos = &c_20089.val
> c_200A2 = ESCAPED_VARS
> ESCAPED_VARS = &c_200A2
> ...
>
> the mainline looks like:
>
> ANYTHING = { ANYTHING }
> READONLY = { ANYTHING }
> INTEGER = { ANYTHING }
> D.28988 = same as infos
> D.28988.c = same as infos
> D.28988.b = same as infos
> infos = { ANYTHING }

This is because we compute call clobbering differently for mainline now.
The thing you'd want to add to 4.2 would be location equivalence
optimization, which i never finished for either 4.2 or 4.3 (4.3 has
code to compute it, but we don't substitute the variables).

Location equivalence would turn the escaped_vars set into 1 variable
during propagation, and then expand it back out at the end.
Comment 17 Joseph S. Myers 2008-02-01 16:54:35 UTC
4.2.3 is being released now, changing milestones of open bugs to 4.2.4.
Comment 18 Richard Biener 2008-03-16 17:29:49 UTC
This will not be fixed on the 4.2 branch.  Closing as fixed in 4.3.0.