This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fix PR tree-optimization/28778, 29156, and 29415


> This large patch fixes three P1 regressions, 28778, 29156, 29415.
> 
> The basic issue in all three is that previously, we had no way to
> describe accesses to nonlocal memory.   When we had no addressable
> variables, we generated no conflicting vuses, even if we were
> dereferencing pointers or passing them to other functions.
> 
> As our aliasing has gotten better, it eventually got to the point
> during 4.2 that we could eliminate all local aliases, and be left with
> nothing to say things aliased.  It was still possible to construct
> these cases with 4.x, it's just *much much* harder.
> Sadly, there is no smaller, less risky, and correct fix i can think of
> for this problem that will not cause either massive compile time
> explosion, or massive performance loss (I tried some before making
> this statement).

Hi,
sadly this patch cause up to 9% memory consumption increase on my
testsuite (and the increase is almost consistent across all testcases).
New memory logs are http://www.suse.de/~aj/SPEC/amd64/memory
if you need the logs just before the run, I can get them easilly.

Honza

Hi,

I am a friendly script caring about memory consumption in GCC.  Please
contact jh@suse.cz if something is going wrong.

Comparing memory consumption on compilation of combine.i, insn-attrtab.i,
and generate-3.4.ii I got:


comparing combine.c compilation at -O0 level:
    Overall memory needed: 24797k -> 24809k
    Peak memory use before GGC: 8929k -> 8930k
    Peak memory use after GGC: 8558k
    Maximum of released memory in single GGC run: 2576k -> 2577k
    Garbage: 34878k -> 34893k
    Leak: 6073k -> 6073k
    Overhead: 4715k -> 4716k
    GGC runs: 294

comparing combine.c compilation at -O1 level:
  Amount of produced GGC garbage increased from 53161k to 55253k, overall 3.94%
  Amount of memory still referenced at the end of compilation increased from 6071k to 6147k, overall 1.25%
    Overall memory needed: 36205k -> 36229k
    Peak memory use before GGC: 17000k -> 16998k
    Peak memory use after GGC: 16830k
    Maximum of released memory in single GGC run: 2262k -> 2310k
    Garbage: 53161k -> 55253k
    Leak: 6071k -> 6147k
    Overhead: 5766k -> 6005k
    GGC runs: 368 -> 366

comparing combine.c compilation at -O2 level:
  Amount of produced GGC garbage increased from 71329k to 74206k, overall 4.03%
  Amount of memory still referenced at the end of compilation increased from 6173k to 6276k, overall 1.67%
    Overall memory needed: 26496k
    Peak memory use before GGC: 16998k -> 17001k
    Peak memory use after GGC: 16830k
    Maximum of released memory in single GGC run: 2508k -> 2842k
    Garbage: 71329k -> 74206k
    Leak: 6173k -> 6276k
    Overhead: 8293k -> 8696k
    GGC runs: 430 -> 439

comparing combine.c compilation at -O3 level:
  Peak amount of GGC memory allocated before garbage collecting increased from 17890k to 18001k, overall 0.62%
  Peak amount of GGC memory still allocated after garbage collectin increased from 17446k to 17545k, overall 0.57%
  Amount of produced GGC garbage increased from 98075k to 101807k, overall 3.80%
  Amount of memory still referenced at the end of compilation increased from 6244k to 6337k, overall 1.49%
    Overall memory needed: 25596k
    Peak memory use before GGC: 17890k -> 18001k
    Peak memory use after GGC: 17446k -> 17545k
    Maximum of released memory in single GGC run: 3324k -> 3996k
    Garbage: 98075k -> 101807k
    Leak: 6244k -> 6337k
    Overhead: 11481k -> 12067k
    GGC runs: 479 -> 489

comparing insn-attrtab.c compilation at -O0 level:
    Overall memory needed: 83700k
    Peak memory use before GGC: 68247k -> 68248k
    Peak memory use after GGC: 43913k
    Maximum of released memory in single GGC run: 35708k -> 35709k
    Garbage: 125964k -> 125972k
    Leak: 9117k -> 9117k
    Overhead: 16830k -> 16830k
    GGC runs: 231

comparing insn-attrtab.c compilation at -O1 level:
  Overall memory allocated via mmap and sbrk increased from 102532k to 111856k, overall 9.09%
  Peak amount of GGC memory allocated before garbage collecting increased from 83520k to 89585k, overall 7.26%
  Peak amount of GGC memory still allocated after garbage collectin increased from 77614k to 82960k, overall 6.89%
  Amount of produced GGC garbage increased from 258206k to 272884k, overall 5.68%
  Amount of memory still referenced at the end of compilation increased from 8927k to 8979k, overall 0.58%
    Overall memory needed: 102532k -> 111856k
    Peak memory use before GGC: 83520k -> 89585k
    Peak memory use after GGC: 77614k -> 82960k
    Maximum of released memory in single GGC run: 31804k -> 31806k
    Garbage: 258206k -> 272884k
    Leak: 8927k -> 8979k
    Overhead: 28448k -> 29303k
    GGC runs: 230 -> 232

comparing insn-attrtab.c compilation at -O2 level:
  Overall memory allocated via mmap and sbrk increased from 105348k to 108256k, overall 2.76%
  Peak amount of GGC memory allocated before garbage collecting increased from 87453k to 91054k, overall 4.12%
  Peak amount of GGC memory still allocated after garbage collectin increased from 79913k to 83154k, overall 4.06%
  Amount of produced GGC garbage increased from 303600k to 311213k, overall 2.51%
  Amount of memory still referenced at the end of compilation increased from 8932k to 8986k, overall 0.60%
    Overall memory needed: 105348k -> 108256k
    Peak memory use before GGC: 87453k -> 91054k
    Peak memory use after GGC: 79913k -> 83154k
    Maximum of released memory in single GGC run: 30384k -> 30385k
    Garbage: 303600k -> 311213k
    Leak: 8932k -> 8986k
    Overhead: 35483k -> 36327k
    GGC runs: 256 -> 258

comparing insn-attrtab.c compilation at -O3 level:
  Overall memory allocated via mmap and sbrk increased from 105388k to 108280k, overall 2.74%
  Peak amount of GGC memory allocated before garbage collecting increased from 87479k to 91079k, overall 4.12%
  Peak amount of GGC memory still allocated after garbage collectin increased from 79939k to 83179k, overall 4.05%
  Amount of produced GGC garbage increased from 304171k to 311793k, overall 2.51%
  Amount of memory still referenced at the end of compilation increased from 8935k to 8989k, overall 0.60%
    Overall memory needed: 105388k -> 108280k
    Peak memory use before GGC: 87479k -> 91079k
    Peak memory use after GGC: 79939k -> 83179k
    Maximum of released memory in single GGC run: 30566k -> 30579k
    Garbage: 304171k -> 311793k
    Leak: 8935k -> 8989k
    Overhead: 35660k -> 36500k
    GGC runs: 260

comparing Gerald's testcase PR8361 compilation at -O0 level:
  Amount of produced GGC garbage increased from 200877k to 201487k, overall 0.30%
    Overall memory needed: 116568k
    Peak memory use before GGC: 92731k -> 92730k
    Peak memory use after GGC: 91812k
    Maximum of released memory in single GGC run: 19778k -> 19777k
    Garbage: 200877k -> 201487k
    Leak: 47326k -> 47326k
    Overhead: 20503k -> 20524k
    GGC runs: 399 -> 400

comparing Gerald's testcase PR8361 compilation at -O1 level:
  Amount of produced GGC garbage increased from 431094k to 435371k, overall 0.99%
  Amount of memory still referenced at the end of compilation increased from 49442k to 49820k, overall 0.77%
    Overall memory needed: 115728k
    Peak memory use before GGC: 97570k -> 97572k
    Peak memory use after GGC: 95362k
    Maximum of released memory in single GGC run: 18425k
    Garbage: 431094k -> 435371k
    Leak: 49442k -> 49820k
    Overhead: 31571k -> 31940k
    GGC runs: 541 -> 545

comparing Gerald's testcase PR8361 compilation at -O2 level:
  Amount of produced GGC garbage increased from 497036k to 512007k, overall 3.01%
  Amount of memory still referenced at the end of compilation increased from 50152k to 50831k, overall 1.35%
    Overall memory needed: 115708k
    Peak memory use before GGC: 97572k -> 97575k
    Peak memory use after GGC: 95363k
    Maximum of released memory in single GGC run: 18424k
    Garbage: 497036k -> 512007k
    Leak: 50152k -> 50831k
    Overhead: 39710k -> 41799k
    GGC runs: 607 -> 622

comparing Gerald's testcase PR8361 compilation at -O3 level:
  Amount of produced GGC garbage increased from 515322k to 529605k, overall 2.77%
  Amount of memory still referenced at the end of compilation increased from 49764k to 50347k, overall 1.17%
    Overall memory needed: 115648k -> 115644k
    Peak memory use before GGC: 97616k
    Peak memory use after GGC: 96649k
    Maximum of released memory in single GGC run: 18845k
    Garbage: 515322k -> 529605k
    Leak: 49764k -> 50347k
    Overhead: 40392k -> 42236k
    GGC runs: 614 -> 631

comparing PR rtl-optimization/28071 testcase compilation at -O0 level:
    Overall memory needed: 134132k -> 134128k
    Peak memory use before GGC: 81623k
    Peak memory use after GGC: 58503k
    Maximum of released memory in single GGC run: 45494k -> 45493k
    Garbage: 143586k -> 143593k
    Leak: 7138k -> 7138k
    Overhead: 25104k -> 25104k
    GGC runs: 87

comparing PR rtl-optimization/28071 testcase compilation at -O1 level:
    Overall memory needed: 425328k -> 420516k
    Peak memory use before GGC: 203150k -> 203205k
    Peak memory use after GGC: 198928k -> 198981k
    Maximum of released memory in single GGC run: 100796k -> 100817k
    Garbage: 264978k -> 265037k
    Leak: 47191k -> 47213k
    Overhead: 30026k -> 30028k
    GGC runs: 106

comparing PR rtl-optimization/28071 testcase compilation at -O2 level:
    Overall memory needed: 347924k -> 346044k
    Peak memory use before GGC: 203907k -> 203956k
    Peak memory use after GGC: 199683k -> 199732k
    Maximum of released memory in single GGC run: 107064k -> 107089k
    Garbage: 354478k -> 354552k
    Leak: 47774k -> 47796k
    Overhead: 47627k -> 47628k
    GGC runs: 113

comparing PR rtl-optimization/28071 testcase compilation at -O3 -fno-tree-pre -fno-tree-fre level:
    Overall memory needed: 536236k -> 533520k
    Peak memory use before GGC: 314598k -> 314653k
    Peak memory use after GGC: 292942k -> 292995k
    Maximum of released memory in single GGC run: 163427k -> 163448k
    Garbage: 487446k -> 487508k
    Leak: 65106k -> 65129k
    Overhead: 58883k -> 58885k
    GGC runs: 100

Head of the ChangeLog is:

--- /usr/src/SpecTests/sandbox-britten-memory/x86_64/mem-result/ChangeLog	2006-10-19 22:34:40.000000000 +0000
+++ /usr/src/SpecTests/sandbox-britten-memory/gcc/gcc/ChangeLog	2006-10-20 09:58:42.000000000 +0000
@@ -1,3 +1,68 @@
+2006-10-19  Brooks Moses  <bmoses@stanford.edu>
+
+	* doc/install.texi (Downloading GCC): Clarify mention of
+	Fortran in the "full distribution" description.
+
+2006-10-19  Daniel Berlin  <dberlin@dberlin.org>
+
+	Fix PR tree-optimization/28778
+	Fix PR tree-optimization/29156
+	Fix PR tree-optimization/29415
+	* tree.h (DECL_PTA_ARTIFICIAL): New macro.
+	(tree_decl_with_vis): Add artificial_pta_var flag.
+	* tree-ssa-alias.c (is_escape_site): Remove alias info argument,
+	pushed into callers.
+	* tree-ssa-structalias.c (nonlocal_for_type): New variable.
+	(nonlocal_all): Ditto.
+	(struct variable_info): Add directly_dereferenced member.
+	(var_escaped_vars): New variable.
+	(escaped_vars_tree): Ditto.
+	(escaped_vars_id): Ditto.
+	(nonlocal_vars_id): Ditto.
+	(new_var_info): Set directly_dereferenced.
+	(graph_size): New variable
+	(build_constraint_graph): Use graph_size.
+	(solve_graph): Don't process constraints that cannot change the
+	solution, don't try to propagate an empty solution to our
+	successors.
+	(process_constraint): Set directly_dereferenced.
+	(could_have_pointers): New function.
+	(get_constraint_for_component_ref): Don't process STRING_CST.
+	(nonlocal_lookup): New function.
+	(nonlocal_insert): Ditto.
+	(create_nonlocal_var): Ditto.
+	(get_nonlocal_id_for_type): Ditto.
+	(get_constraint_for): Allow results vector to be empty in the case
+	of string constants.
+	Handle results of calls properly.
+	(update_alias_info): Update alias info stats on number and type of
+	calls.
+	(find_func_aliases): Use could_have_pointers.
+	(make_constraint_from_escaped): Renamed from
+	make_constraint_to_anything, and changed to make constraints from
+	escape variable.
+	(make_constraint_to_escaped): New function.
+	(find_global_initializers): Ditto.
+	(create_variable_info_for): Make constraint from escaped to any
+	global variable, and from any global variable to the set of
+	escaped vars.
+	(intra_create_variable_infos): Deal with escaped instead of
+	pointing to anything.
+	(set_uids_in_ptset): Do type pruning on directly dereferenced
+	variables.
+	(find_what_p_points_to): Adjust call to set_uids_with_ptset.
+	(init_base_vars): Fix comment, and initialize escaped_vars.
+	(need_to_solve): Removed.
+	(find_escape_constraints): New function.
+	(expand_nonlocal_solutions): Ditto.
+	(compute_points_to_sets): Call find_escape_constraints and
+	expand_nonlocal_solutions.
+	(delete_points_to_sets): Don't fall off the end of the graph.
+	(init_alias_heapvars): Initialize nonlocal_for_type and
+	nonlocal_all.
+	(delete_alias_heapvars): Free nonlocal_for_type and null out
+	nonlocal_all. 
+
 2006-10-19  Eric Botcazou  <ebotcazou@adacore.com>
 
 	* fold-const.c (add_double): Rename to add_double_with_sign.


The results can be reproduced by building a compiler with

--enable-gather-detailed-mem-stats targetting x86-64

and compiling preprocessed combine.c or testcase from PR8632 with:

-fmem-report --param=ggc-min-heapsize=1024 --param=ggc-min-expand=1 -Ox -Q

The memory consumption summary appears in the dump after detailed listing
of the places they are allocated in.  Peak memory consumption is actually
computed by looking for maximal value in {GC XXXX -> YYYY} report.

Your testing script.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]