Bug 37448 - cannot compile big function
Summary: cannot compile big function
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.3.1
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: compile-time-hog, memory-hog
Depends on:
Blocks: 47344
  Show dependency treegraph
 
Reported: 2008-09-09 18:34 UTC by Klaus Grue
Modified: 2013-03-06 10:45 UTC (History)
8 users (show)

See Also:
Host: x86_64-suse-linux
Target: x86_64-suse-linux
Build: x86_64-suse-linux
Known to work: 4.2.4
Known to fail: 4.3.2, 4.6.0, 4.7.2, 4.8.0
Last reconfirmed: 2008-09-18 21:11:40


Attachments
The source mentioned in the bug description (118.03 KB, text/plain)
2008-09-10 08:01 UTC, Klaus Grue
Details
The statistics mentioned in the bug description (15.02 KB, text/plain)
2008-09-10 08:01 UTC, Klaus Grue
Details
detailed memory stats with checking enabled and long statistic counters (15.85 KB, text/plain)
2008-09-18 01:30 UTC, lucier
Details
memory and time statistics for compiling lgwam.c (3.95 KB, text/plain)
2012-10-26 02:05 UTC, lucier
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Klaus Grue 2008-09-09 18:34:11 UTC
After upgrading to gcc 4.3.1, I can no longer compile a function whose
source code is 0.7 Megabyte before preprocessing and 3.5 Megabyte after
preprocessing.

The function (named "testsuite") is just a long list of statements
essentially of form if(!condition){complain();exit();}

The behaviour is: CPU time goes to 100%, then RAM size grows to
1 Gigabyte, then swap space starts growing and CPU time goes to 10%.

On my previous gcc (4.2.something, I think), compilation went fine.

The problematic source can be found at http://logiweb.eu/grue/lgwam.c

The source only #includes standard headers.

The source is compiled thus:

gcc -ldl -o lgwam lgwam.c

To double check that optimization is off, I tried with -O0
which gave the same result as not using -O0.

I failed to find out how to fill in Host, Target, and Build
triplet. Sorry. Here comes the output from gcc -v:

---

grue@pc189-kgr:~> gcc -v
Using built-in specs.
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --with-local-prefix=/usr/local
--infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64
--libexecdir=/usr/lib64
--enable-languages=c,c++,objc,fortran,obj-c++,java,ada
--enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.3
--enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/
--with-pkgversion='SUSE Linux' --disable-libgcj --with-slibdir=/lib64
--with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new
--disable-libstdcxx-pch --program-suffix=-4.3
--enable-version-specific-runtime-libs --enable-linux-futex
--without-system-libunwind --with-cpu=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036]
(SUSE Linux)

---

Bradley Lucier (lucier aat math doot purdue doot edu) suggested
that this could resemble
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26854

He wrote thus at the gcc mailing list:

---

I compiled your file with a recent version of the current development
compiler

euler-27% /pkgs/gcc-mainline/bin/gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../../mainline/configure --enable-checking=release --
with-gmp=/pkgs/gmp-4.2.2/ --with-mpfr=/pkgs/gmp-4.2.2/ --prefix=/pkgs/
gcc-mainline --enable-languages=c --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 4.4.0 20080827 (experimental) [trunk revision 139624] (GCC)

with

 /pkgs/gcc-mainline/bin/gcc -c lgwam.c -fmem-report -ftime-report

and got the following statistics.  If you file a bug report, then they may
prove helpful.

---

I have placed the mentioned statistics at
http://logiweb.eu/grue/lgwam.c.stats-O0
Comment 1 Klaus Grue 2008-09-10 08:01:13 UTC
Created attachment 16281 [details]
The source mentioned in the bug description
Comment 2 Klaus Grue 2008-09-10 08:01:54 UTC
Created attachment 16282 [details]
The statistics mentioned in the bug description
Comment 3 Richard Biener 2008-09-10 08:02:43 UTC
It indeed uses about 1.4GB ram and takes 26s (I think that's reasonable) to
build the testcase.  It's the DF initialize pass that requires this much
memory.  The function in question has 31671 basic blocks.  dfinish doesn't
seem to free it btw.

(gdb) print *df
$56 = {problems_in_order = {0x1b25920, 0x1b259a0, 0x0, 0x0, 0x0, 0x0}, 
  problems_by_index = {0x1b25920, 0x1b259a0, 0x0, 0x0, 0x0, 0x0}, 
  num_problems_defined = 2, blocks_to_analyze = 0x0, analyze_subset = 0 '\0', 
  redo_entry_and_exit = 0 '\0', def_info = {refs = 0x0, begin = 0x19ee470, 
    count = 0x38e3e2e0, refs_size = 0, table_size = 0, total_size = 7482051, 
    ref_order = DF_REF_ORDER_NO_TABLE}, use_info = {refs = 0x0, 
    begin = 0x38ef7fe0, count = 0x38fb1ce0, refs_size = 0, table_size = 0, 
    total_size = 644313, ref_order = DF_REF_ORDER_NO_TABLE}, 
  def_regs = 0x2f766c50, use_regs = 0x38b56f20, eq_use_regs = 0x38cca900, 
  regs_size = 190267, regs_inited = 152214, insns = 0x1d48fb0, 
  insns_size = 792538, hardware_regs_used = 0x2ef1e640, 
  regular_block_artificial_uses = 0x2ef1e660, 
  eh_block_artificial_uses = 0x2ef1e680, entry_block_defs = 0x2ef1e6a0, 
  exit_block_uses = 0x2ef1e6c0, insns_to_delete = 0x2f928b40, 
  insns_to_rescan = 0x2f928b60, insns_to_notes_rescan = 0x2f928b80, 
  postorder = 0x30d2c60, postorder_inverted = 0x30f1b50, n_blocks = 31671, 
  n_blocks_inverted = 31671, hard_regs_live_count = 0x1940610, 
  ref_order = 39346414, changeable_flags = 0}

Kenny, any idea where the memory leaks?
Comment 4 Richard Biener 2008-09-10 08:10:49 UTC
On the trunk compile-time sky-rocketed there
in building the cgraph.  Honza, what's going on there - it doesn't seem to
even finish ;)
Comment 5 Jan Hubicka 2008-09-10 09:57:18 UTC
With ENABLE_CHECKING we spend ages checking that there are no duplicated edges in callgraph.  Will fix it shortly.
Comment 6 Jan Hubicka 2008-09-10 10:02:55 UTC
Now we spend eons in walking lexical blocks, since add_lexical_block called by inliner walks till end of the chain.
Is the order of blocks important at all?  (i.e. we don't insert them to proper place anyway, just as last block, we can perhaps as well insert them as first?)

Honza
Comment 7 Jan Hubicka 2008-09-10 11:32:37 UTC
#0  0x000000000099c80d in ira_allocno_live_ranges_intersect_p (a1=0x33da8ae0, a2=0x34395120) at ../../gcc/ira-conflicts.c:535
#1  0x000000000099ff0e in coalesced_allocno_conflict_p (a1=0x34e221b0, a2=0x33da8ae0, reload_p=1 '\001') at ../../gcc/ira-color.c:1436
#2  0x00000000009a2904 in ira_sort_regnos_for_alter_reg (pseudo_regnos=0x5eb441f0, n=<value optimized out>, reg_max_ref_width=0x580ffb00)
    at ../../gcc/ira-color.c:2278
#3  0x00000000005e4f86 in reload (first=0x2aaaacf99680, global=1) at ../../gcc/reload1.c:907
#4  0x000000000099472c in rest_of_handle_ira () at ../../gcc/ira.c:1867
#5  0x000000000059c211 in execute_one_pass (pass=0xdf3020) at ../../gcc/passes.c:1279
#6  0x000000000059c445 in execute_pass_list (pass=0xdf3020) at ../../gcc/passes.c:1327
#7  0x000000000059c45d in execute_pass_list (pass=0xdee140) at ../../gcc/passes.c:1328
#8  0x000000000066bc07 in tree_rest_of_compilation (fndecl=0x2aaaab9ae200) at ../../gcc/tree-optimize.c:418


OK, one hour later (at -O2) we spent all the time in IRA/reload.
Vladimir, could you please take a look?

Honza
Comment 8 Jan Hubicka 2008-09-10 11:39:29 UTC
And at -O0 we still need about 2GB (because of dataflow leaks?), otherwise we seem fine compilation time wise
 garbage collection    :   0.92 ( 3%) usr   0.00 ( 0%) sys   0.93 ( 3%) wall       0 kB ( 0%) ggc
 callgraph construction:   0.48 ( 1%) usr   0.03 ( 2%) sys   0.53 ( 2%) wall   22055 kB ( 4%) ggc
 callgraph optimization:   0.44 ( 1%) usr   0.02 ( 1%) sys   0.48 ( 1%) wall   16397 kB ( 3%) ggc
 cfg cleanup           :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       3 kB ( 0%) ggc
 trivially dead code   :   0.42 ( 1%) usr   0.00 ( 0%) sys   0.44 ( 1%) wall       0 kB ( 0%) ggc
 df live regs          :   0.80 ( 2%) usr   0.01 ( 1%) sys   0.79 ( 2%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   1.57 ( 5%) usr   0.01 ( 1%) sys   1.59 ( 5%) wall   16931 kB ( 3%) ggc
 register information  :   0.68 ( 2%) usr   0.00 ( 0%) sys   0.68 ( 2%) wall       0 kB ( 0%) ggc
 alias analysis        :   0.27 ( 1%) usr   0.00 ( 0%) sys   0.26 ( 1%) wall    4290 kB ( 1%) ggc
 rebuild jump labels   :   0.36 ( 1%) usr   0.00 ( 0%) sys   0.37 ( 1%) wall       0 kB ( 0%) ggc
 preprocessing         :   0.32 ( 1%) usr   0.16 (10%) sys   0.41 ( 1%) wall   25859 kB ( 5%) ggc
 lexical analysis      :   0.15 ( 0%) usr   0.33 (21%) sys   0.56 ( 2%) wall       0 kB ( 0%) ggc
 parser                :   0.48 ( 1%) usr   0.17 (11%) sys   0.64 ( 2%) wall   52995 kB (10%) ggc
 inline heuristics     :   0.28 ( 1%) usr   0.00 ( 0%) sys   0.27 ( 1%) wall       0 kB ( 0%) ggc
 tree gimplify         :   0.89 ( 3%) usr   0.06 ( 4%) sys   0.96 ( 3%) wall  125511 kB (25%) ggc
 tree eh               :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 tree CFG construction :   0.18 ( 1%) usr   0.01 ( 1%) sys   0.18 ( 1%) wall   12213 kB ( 2%) ggc
 tree CFG cleanup      :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
 tree find ref. vars   :   0.15 ( 0%) usr   0.01 ( 1%) sys   0.15 ( 0%) wall   14419 kB ( 3%) ggc
 tree PHI insertion    :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     150 kB ( 0%) ggc
 tree SSA rewrite      :   0.08 ( 0%) usr   0.01 ( 1%) sys   0.10 ( 0%) wall   18660 kB ( 4%) ggc
 tree SSA other        :   0.22 ( 1%) usr   0.06 ( 4%) sys   0.21 ( 1%) wall     512 kB ( 0%) ggc
 tree operand scan     :   0.10 ( 0%) usr   0.02 ( 1%) sys   0.17 ( 1%) wall    9402 kB ( 2%) ggc
 tree SSA to normal    :   0.22 ( 1%) usr   0.00 ( 0%) sys   0.23 ( 1%) wall      14 kB ( 0%) ggc
 dominance frontiers   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation :   0.19 ( 1%) usr   0.01 ( 1%) sys   0.17 ( 1%) wall       0 kB ( 0%) ggc
 expand                :   5.47 (17%) usr   0.40 (26%) sys   5.82 (17%) wall  171648 kB (34%) ggc
 jump                  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 integrated RA         :   5.63 (17%) usr   0.10 ( 6%) sys   5.82 (17%) wall    5190 kB ( 1%) ggc
 reload                :   9.42 (29%) usr   0.13 ( 8%) sys   9.55 (28%) wall    8842 kB ( 2%) ggc
 thread pro- & epilogue:   1.01 ( 3%) usr   0.00 ( 0%) sys   1.02 ( 3%) wall     594 kB ( 0%) ggc
 reg stack             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 final                 :   1.22 ( 4%) usr   0.01 ( 1%) sys   1.25 ( 4%) wall    1639 kB ( 0%) ggc
 TOTAL                 :  32.33             1.55            33.95             508532 kB
Comment 9 Vladimir Makarov 2008-09-10 21:16:38 UTC
There are 66K allocnos and 3M live ranges in function testsuite. Fixing the problem in IRA will take some time.  I hope the patch will be ready in a few days.
Comment 10 Jan Hubicka 2008-09-11 12:36:17 UTC
Subject: Bug 37448

Author: hubicka
Date: Thu Sep 11 12:34:53 2008
New Revision: 140281

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=140281
Log:
	PR middle-end/37448
	* tree-inline.c (add_lexical_block): Replace with ...
	(prepend_lexical_block): ... prepend at begginig.
	(remap_blocks): Use it and reverse later.
	(expand_call_inline): Use prepend_lexical_block.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-inline.c

Comment 11 Jan Hubicka 2008-09-11 12:40:20 UTC
Subject: Bug 37448

Author: hubicka
Date: Thu Sep 11 12:38:57 2008
New Revision: 140284

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=140284
Log:

	PR middle-end/37448
	* cgraph.c (cgraph_create_edge): Use !cgraph_edge for sanity check.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cgraph.c

Comment 12 hjl@gcc.gnu.org 2008-09-13 15:58:47 UTC
Subject: Bug 37448

Author: hjl
Date: Sat Sep 13 15:57:26 2008
New Revision: 140345

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=140345
Log:
2008-09-13  Vladimir Makarov <vmakarov@redhat.com>

	PR middle-end/37448
	* ira-int.h (IRA_ALLOCNO_TEMP): Rename to ALLOCNO_TEMP.
	(ira_compress_allocno_live_ranges): New prototype.

	* ira-color.c: Rename IRA_ALLOCNO_TEMP to ALLOCNO_TEMP.
	(coalesced_allocnos_living_at_program_points): New.
	(coalesced_allocnos_live_at_points_p,
	set_coalesced_allocnos_live_points): New functions.
	(coalesce_spill_slots): Rewrite.
	
	* ira-lives.c (remove_some_program_points_and_update_live_ranges,
	ira_compress_allocno_live_ranges): New functions.

	* ira-build.c (ira_flattening): Call
	ira_compress_allocno_live_ranges.
	(ira_build): Ditto.

Modified:
    branches/ira-merge/gcc/ChangeLog.ira
    branches/ira-merge/gcc/ira-build.c
    branches/ira-merge/gcc/ira-color.c
    branches/ira-merge/gcc/ira-int.h
    branches/ira-merge/gcc/ira-lives.c

Comment 13 lucier 2008-09-17 17:49:03 UTC
In the attached statistics file you find where this allocation overflowed

Alloc-pool Kind        Pools  Allocated      Peak        Leak
-------------------------------------------------------------
df_scan_ref pool         868 -1403330584  607042176          0

which is similar to what happens in 26854, so perhaps this PR could also be improved with factored use-def chains.
Comment 14 lucier 2008-09-18 01:30:32 UTC
Created attachment 16351 [details]
detailed memory stats with checking enabled and long statistic counters

I reran the test problem with checking enabled and this patch, which changes the memory statistic counters from int to long, installed:

http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01270.html

So we see that 5,272,694,920 bytes are allocated in alloc_pools, and 331,198,680 btytes are allocated in bitmaps.
Comment 15 Jan Hubicka 2008-09-18 16:49:25 UTC
We seem to make nice progress on the testcase ;)

Conversion to FUD would probably help here, but looking at the DF dumps, the testcase don't look like your average testcase showing DU/UD chain explosion.  The chains seem relatively short at least in manual inspection through first part of dump. Yet we produe about 40 DU/UD records live at once for every instruction ever emit, so it seems we are doing some wrong still.

If someone more familiar with dataflow branch could look into why we have so many DU/UDs, it would be great.

Honza
Comment 16 Jan Hubicka 2008-09-18 17:30:05 UTC
Subject: Bug 37448

Author: hubicka
Date: Thu Sep 18 17:28:40 2008
New Revision: 140463

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=140463
Log:

	PR middle-end/37448
	* ipa-reference.c (ipa_reference_local_vars_info_d,
	ipa_reference_global_vars_info_d,
	ipa_reference_local_vars_info_t, ipa_reference_global_vars_info_t,
	ipa_reference_vars_info_t): Move here from ipa-reference.h
	(node_duplication_hook_holder, node_removal_hook_holder): New.
	(get_reference_vars_info_from_cgraph): Rename to ...
	(get_reference_vars_info): ... this one, use cgraph uids.
	(get_local_reference_vars_info, get_global_reference_vars_info):
	Use cgraph instead of decl.
	(ipa_reference_get_read_local, ipa_reference_get_written_local): Remove.
	(ipa_reference_get_read_global, ipa_reference_get_not_read_global
	ipa_reference_get_written_global, ipa_reference_get_not_written_global): Use
	cgraph argument.
	(check_call): Simplify avail check.
	(scan_stmt_for_static_refs): Update.
	(propagate_bits): Update.
	(merge_callee_local_info): Remove.
	(init_function_info): Use cgraph nodes.
	(clean_function_local_data): Break out from ...
	(clean_function): ... here.
	(copy_local_bitmap, copy_global_bitmap): New functions.
	(duplicate_node_data, remove_node_data): New functions.
	(generate_summary): Register hooks; use visibility instead of
	master clones.
	(propafate): Use cgraph nodes; copy bitmap to each node in cycle.
	* ipa-reference.h (ipa_reference_local_vars_info_d,
	ipa_reference_global_vars_info_d,
	ipa_reference_local_vars_info_t, ipa_reference_global_vars_info_t,
	ipa_reference_vars_info_t): Move to ipa-reference.c
	(ipa_reference_get_read_local, ipa_reference_get_written_local):
	Remove.
	(ipa_reference_get_read_global, ipa_reference_get_written_global,
	ipa_reference_get_not_read_global, ipa_reference_get_not_written_global):
	Update prototype.
	* ipa-pure-const.c (funct_state_vec): Turn into VECtor.
	(init_state): Remove.
	(node_duplication_hook_holder, node_removal_hook_holder): New.
	(get_function_state, set_function_state): Use VECtor.
	(analyze_function): Check body availability.
	(add_new_function): Likewise.
	(duplicate_node_data, remove_node_data): New.
	(generate_summary): Register hooks; do not care about clones.
	(propafate): Do not care about clones; recursive functions are not looping.
	* ipa-utils.c (searchc, ipa_utils_reduced_inorder): Do not skip clones.
	* ipa-prop.c (edge_removal_hook_holder, node_removal_hook_holder,
	* edge_duplication_hook_holder, node_duplication_hook_holder): Make
	static.
	* tree-flow.h (function_ann_d): Remove reference_vars_info.
	* tree-ssa-opreands.c (add_call_clobber_ops, add_call_read_ops): Update call of
	ipa-reference accesors.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-prop.c
    trunk/gcc/ipa-pure-const.c
    trunk/gcc/ipa-reference.c
    trunk/gcc/ipa-reference.h
    trunk/gcc/ipa-utils.c
    trunk/gcc/tree-ssa-operands.c

Comment 17 Jan Hubicka 2008-09-18 18:18:15 UTC
Subject: Bug 37448

Author: hubicka
Date: Thu Sep 18 18:16:45 2008
New Revision: 140465

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=140465
Log:

	PR middle-end/37448
	* ipa-reference.c (ipa_reference_local_vars_info_d,
	ipa_reference_global_vars_info_d,
	ipa_reference_local_vars_info_t, ipa_reference_global_vars_info_t,
	ipa_reference_vars_info_t): Move here from ipa-reference.h
	(node_duplication_hook_holder, node_removal_hook_holder): New.
	(get_reference_vars_info_from_cgraph): Rename to ...
	(get_reference_vars_info): ... this one, use cgraph uids.
	(get_local_reference_vars_info, get_global_reference_vars_info):
	Use cgraph instead of decl.
	(ipa_reference_get_read_local, ipa_reference_get_written_local): Remove.
	(ipa_reference_get_read_global, ipa_reference_get_not_read_global
	ipa_reference_get_written_global, ipa_reference_get_not_written_global): Use
	cgraph argument.
	(check_call): Simplify avail check.
	(scan_stmt_for_static_refs): Update.
	(propagate_bits): Update.
	(merge_callee_local_info): Remove.
	(init_function_info): Use cgraph nodes.
	(clean_function_local_data): Break out from ...
	(clean_function): ... here.
	(copy_local_bitmap, copy_global_bitmap): New functions.
	(duplicate_node_data, remove_node_data): New functions.
	(generate_summary): Register hooks; use visibility instead of
	master clones.
	(propafate): Use cgraph nodes; copy bitmap to each node in cycle.
	* ipa-reference.h (ipa_reference_local_vars_info_d,
	ipa_reference_global_vars_info_d,
	ipa_reference_local_vars_info_t, ipa_reference_global_vars_info_t,
	ipa_reference_vars_info_t): Move to ipa-reference.c
	(ipa_reference_get_read_local, ipa_reference_get_written_local):
	Remove.
	(ipa_reference_get_read_global, ipa_reference_get_written_global,
	ipa_reference_get_not_read_global, ipa_reference_get_not_written_global):
	Update prototype.
	* ipa-pure-const.c (funct_state_vec): Turn into VECtor.
	(init_state): Remove.
	(node_duplication_hook_holder, node_removal_hook_holder): New.
	(get_function_state, set_function_state): Use VECtor.
	(analyze_function): Check body availability.
	(add_new_function): Likewise.
	(duplicate_node_data, remove_node_data): New.
	(generate_summary): Register hooks; do not care about clones.
	(propafate): Do not care about clones; recursive functions are not looping.
	* ipa-utils.c (searchc, ipa_utils_reduced_inorder): Do not skip clones.
	* ipa-prop.c (edge_removal_hook_holder, node_removal_hook_holder,
	* edge_duplication_hook_holder, node_duplication_hook_holder): Make
	static.
	* tree-flow.h (function_ann_d): Remove reference_vars_info.
	* tree-ssa-opreands.c (add_call_clobber_ops, add_call_read_ops): Update call of
	ipa-reference accesors.

Modified:
    trunk/gcc/tree-flow.h

Comment 18 Jan Hubicka 2008-09-18 21:11:39 UTC
Hi,
some stats on instruction counts and references Kenny asked about:

At -O2:
jh@gcc17:~/gcc-baseline/build-ada/prev-gcc$ grep "(insn" e.i.157r.outof_cfglayout | wc -l
307811
jh@gcc17:~/gcc-baseline/build-ada/prev-gcc$ grep "(call_insn" e.i.157r.outof_cfglayout | wc -l
157016
jh@gcc17:~/gcc-baseline/build-ada/prev-gcc$ grep "(jump_insn" e.i.157r.outof_cfglayout | wc -l
18650

at -O0:
jh@gcc17:~/gcc-baseline/build-ada/prev-gcc/O0$ grep "(insn" e.i.157r.outof_cfglayout | wc -l
411876
jh@gcc17:~/gcc-baseline/build-ada/prev-gcc/O0$ grep "(call_insn" e.i.157r.outof_cfglayout | wc -l
166471
jh@gcc17:~/gcc-baseline/build-ada/prev-gcc/O0$ grep "(jump_insn" e.i.157r.outof_cfglayout | wc -l
18218

I am not sure if I can grep for references in some of the RTL dumps...

Honza
Comment 19 Vladimir Makarov 2008-09-26 00:15:56 UTC
Subject: Bug 37448

Author: vmakarov
Date: Fri Sep 26 00:14:30 2008
New Revision: 140674

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=140674
Log:
2008-09-25  Vladimir Makarov  <vmakarov@redhat.com>

	PR middle-end/37448
	
	* ira-int.h (IRA_ALLOCNO_TEMP): Rename to ALLOCNO_TEMP.
	(ira_compress_allocno_live_ranges): New prototype.

	* ira-color.c: Rename IRA_ALLOCNO_TEMP to ALLOCNO_TEMP.
	(coalesced_allocnos_living_at_program_points): New.
	(coalesced_allocnos_live_at_points_p,
	set_coalesced_allocnos_live_points): New functions.
	(coalesce_spill_slots): Rewrite.
	
	* ira-lives.c (remove_some_program_points_and_update_live_ranges,
	ira_compress_allocno_live_ranges): New functions.

	* ira-build.c (ira_flattening): Call
	ira_compress_allocno_live_ranges.
	(ira_build): Ditto.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-build.c
    trunk/gcc/ira-color.c
    trunk/gcc/ira-int.h
    trunk/gcc/ira-lives.c

Comment 20 Jan Hubicka 2008-09-26 21:12:49 UTC
The testcase now compiles at -O2 in resonable time.
Checking enabled compiler:
 CFG verifier          :  15.02 ( 5%) usr   0.01 ( 0%) sys  15.13 ( 5%) wall       0 kB ( 0%) ggc
 df reaching defs      :   7.57 ( 2%) usr   0.27 ( 7%) sys   7.81 ( 2%) wall       0 kB ( 0%) ggc
 df live regs          :  13.13 ( 4%) usr   0.00 ( 0%) sys  13.16 ( 4%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   3.62 ( 1%) usr   0.00 ( 0%) sys   3.60 ( 1%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   3.83 ( 1%) usr   0.03 ( 1%) sys   3.85 ( 1%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   8.56 ( 3%) usr   0.01 ( 0%) sys   8.53 ( 3%) wall   23492 kB ( 3%) ggc
  tree SSA verifier     :  22.27 ( 7%) usr   0.04 ( 1%) sys  22.39 ( 7%) wall       0 kB ( 0%) ggc
 tree STMT verifier    :  27.32 ( 9%) usr   0.42 (11%) sys  27.65 ( 9%) wall       0 kB ( 0%) ggc
 expand                :  22.57 ( 7%) usr   0.10 ( 3%) sys  22.74 ( 7%) wall  142836 kB (16%) ggc
 integrated RA         :  72.71 (24%) usr   0.00 ( 0%) sys  74.09 (23%) wall    5404 kB ( 1%) ggc
 reload                :  17.30 ( 6%) usr   0.00 ( 0%) sys  20.38 ( 6%) wall   26179 kB ( 3%) ggc
 TOTAL                 : 306.65             3.67           316.44             908200 kB

So we do resonable job here.  I wonder where those 7% of expansion time goes to.
We still need over 2GB RAM because of dataflow info.  I am going to test how much difference Kenny's patch make.
Comment 21 Jan Hubicka 2008-09-26 22:15:00 UTC
I've just tested Kenny's df scan ref union patch.  We still have over 2.2GB footprint that is not that much different from unpatched compiler.
allocpool statistics now give some more useful data:
df_scan_ref base pool          868 -1871497632  553770408          0
copies                         434     214464     134512          0
df_scan_ref artificial pool    868   77289168   11231352          0
df_scan_ref regular pool       868  737848848   72493272          0

So largest one is df_scan_ref base, not artifical defs as was suspected.  About 0.5GB comming to that storage.

Other large pool (now at -O2) is:
allocno live ranges      434  167266736  167062320          0

And bitmaps:
df-problems.c:311 (df_rd_alloc)           110920 1071636320 1071611720 1071611720          0
df-problems.c:312 (df_rd_alloc)           110920  543723720  534523840  534523840          0
df-problems.c:539 (df_rd_transfer_functio  52865  541674400  537236800  537236760     101861
tree-ssa-pre.c:584 (bitmap_set_new)       547137 1368222720 1366138880 1366138880     784222
tree-ssa-pre.c:585 (bitmap_set_new)       547137  951660760  949608960  949608960    1467881

and GGC garbage:
emit-rtl.c:3343 (make_insn_raw)                    33796400: 4.4%          0: 0.0%         88: 0.0%    3072408: 3.1%     384051
tree-inline.c:3600 (copy_tree_r)                   36731592: 4.8%          0: 0.0%          0: 0.0%    2702392: 2.7%     478660
lists.c:143 (alloc_EXPR_LIST)                      37188960: 4.8%          0: 0.0%          0: 0.0%    7437792: 7.5%     929724
tree-ssanames.c:141 (make_ssa_name_fn)             52532760: 6.8%          0: 0.0%       3480: 0.0%    3502416: 3.5%     437802
Comment 22 Jan Hubicka 2008-09-26 23:34:51 UTC
With Kenny's updated df patch and updated statistics I now get:
Alloc-pool Kind         Pools  Elt size  Allocated (elts)           Peak (elts)           Leak (elts)
--------------------------------------------------------------------------------------------------------------
elt_loc_list               40       3863  110676080(   2766902)   13178600(    329465)          0(         0)
df_scan ref base           80        868 -1871497632( -23393720) 553609440(   6920118)          0(         0)
df_scan ref artificial     88        868   77289168(    878286)   11224840(    127555)          0(         0)
df_scan ref regular        88        868  737848848(   8384646)   72474072(    823569)          0(         0)


About 17 refs live for every instruction.
Will check memory footprint after return from California.
Comment 23 Kenneth Zadeck 2008-09-27 12:44:24 UTC
I do not believe honza.  

My measurements at -O0 on x86-42 are about 15 refs per insn.  
This is based on the following stats.  (These can be reproduced using a patch that i am about to submit).

;;    total ref usage 8428419{7601408d,827011u,0e} in 570685{406804 regular + 163881 call} insns.

This yields about 15 refs per insn.   While this number is large, it is reasonable considering that slightly less than 30% of the insns are call instructions.   Call instructions have a lot of clobbers.   It is possible that some mechanism could be devised to share these refs, but this will mess up things like building chains so it is certainly not something that is going to be easy to do.

The df patch that i have submitted makes modest progress on reducing the size of df-refs.   Hopefully bonzini will finish reviewing this soon.

I should also point out that honza's alloc pool stats were completely bogus.  I have submitted a patch that fixes the way stats are accumulated for alloc-pools.  We can account for all of the df-refs and the peak usage according to the new alloc-pool stats is very close to the number used by the largest function.

Once those patches are installed, I will consider this bugzilla resolved with respect to the df issues.  
Comment 24 zadeck@gcc.gnu.org 2008-10-08 02:53:50 UTC
Subject: Bug 37448

Author: zadeck
Date: Wed Oct  8 02:52:28 2008
New Revision: 140960

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=140960
Log:
2008-10-07  Kenneth Zadeck <zadeck@naturalbridge.com>

	PR rtl-optimization/37448
	alloc_pool_desc (elt_size): New field.
	alloc_pool_desc (created, allocated, current, peak): Make unsigned
	long.
	output_info (count): Renamed total_created and made unsigned long.
	output_info (size): Renamed total_allocated and made unsigned long.
	alloc-pool.c (create_alloc_pool, empty_alloc_pool, pool_alloc,
	pool_free): Properly keep track of desc->size.
	(print_statistics, dump_alloc_pool_statistics): Enhance the
	printing of statistics to print the number of elements and to use
	unsigned longs.
	

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/alloc-pool.c

Comment 25 lucier 2008-10-08 03:37:45 UTC
I'm sorry, I haven't been reading gcc-patches recently, but this is quite similar to my patch suggested here:

http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01270.html

which also "fixed" the counters for bitmaps.  The consensus seemed to be to use HOST_WIDEST_INT for the counters (for 64-bit Windows, pointers are 64-bit and longs are 32-bit).
Comment 26 zadeck@gcc.gnu.org 2008-10-11 23:40:45 UTC
Subject: Bug 37448

Author: zadeck
Date: Sat Oct 11 23:39:21 2008
New Revision: 141067

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=141067
Log:
2008-10-11  Kenneth Zadeck <zadeck@naturalbridge.com>

	PR rtl-optimization/37448
	* df.h: (df_ref_class): New enum.
	(DF_REF_TYPE_NAMES, df_ref_extract): Removed.
	(struct df_ref): Replaced with union df_ref_d.
	(df_base_ref, df_artificial_ref, df_regular_ref, df_extract_ref):
	New members of df_ref_d union.
	(DF_REF_REAL_REG, DF_REF_REGNO, DF_REF_REAL_LOC, DF_REF_REG,
	DF_REF_LOC, DF_REF_BB, DF_REF_INSN_INFO, DF_REF_INSN,
	DF_REF_CLASS, DF_REF_TYPE, DF_REF_CHAIN, DF_REF_ID, DF_REF_FLAGS,
	DF_REF_ORDER, DF_REF_IS_ARTIFICIAL, DF_REF_NEXT_REG,
	DF_REF_PREV_REG, DF_REF_EXTRACT_WIDTH, DF_REF_EXTRACT_OFFSET,
	DF_REF_EXTRACT_MODE): Replaced definition to access union
	df_ref_d.
       	(DF_MWS_REG_DEF_P, DF_MWS_REG_USE_P, DF_MWS_TYPE): New macros.
	(df_scan_bb_info, df_bb_regno_first_def_find,
	df_bb_regno_last_def_find, df_find_def, df_find_use,
	df_refs_chain_dump, df_regs_chain_dump, df_ref_debug,
	debug_df_ref, df_chain_create, df_chain_unlink, df_chain_copy,
	df_ref_create, df_ref_remove, df_compute_accessed_bytes,
	df_get_artificial_defs, df_get_artificial_uses, union_defs)
	Replaced struct df_ref * with df_ref.
	* df-scan.c (df_collection_rec, df_null_ref_rec,
	df_ref_chain_delete_du_chain, df_ref_chain_delete, df_install_ref,
	df_grow_ref_info, df_ref_create, df_reg_chain_unlink,
	df_ref_compress_rec, df_ref_remove, df_ref_chain_delete_du_chain,
	df_ref_chain_delete, df_free_collection_rec, df_insn_rescan,
	df_reorganize_refs_by_reg_by_reg,
	df_reorganize_refs_by_reg_by_insn, df_reorganize_refs_by_reg,
	df_ref_change_reg_with_loc_1, df_notes_rescan, df_swap_refs,
	df_sort_and_compress_refs, df_install_ref, df_install_refs,
	df_ref_record, df_get_conditional_uses, df_get_call_refs,
	df_bb_refs_record, df_exit_block_uses_collect,
	df_record_exit_block_uses, df_reg_chain_mark,
	df_reg_chain_verify_unmarked, df_refs_verify): Replaced struct
	df_ref * with df_ref.
	(df_ref_record, df_uses_record, df_ref_create_structure): Added
	df_ref_class parameter.
	(df_scan_problem_data): Added new pools for different types of
	refs.
	(df_scan_free_internal, df_scan_alloc, df_free_ref,
	df_ref_create_structure): Processed new ref pools.
	(df_scan_start_dump): Added counts of refs and insns.
	(df_ref_create, df_notes_rescan, df_def_record_1, df_uses_record,
	df_get_call_refs, df_insn_refs_collect, df_bb_refs_collect,
	df_entry_block_defs_collect, df_exit_block_uses_collect): Added
	code to pass df_ref_class down to ref creation functions.
	(df_reg_chain_unlink, df_ref_remove, df_ref_change_reg_with_loc_1,
	df_reg_chain_mark): Use macros to hide references to df_refs.
	(df_ref_chain_change_bb): Removed.
	(df_insn_change_bb): Remove calls to df_ref_insn_change_bb.
	(df_ref_equal_p, df_ref_compare, df_ref_create_structure):
	Enhanced to understand df_ref union structure.
	* fwprop.c (local_ref_killed_between_p, use_killed_between,
	all_uses_available_at, update_df, try_fwprop_subst,
	forward_propagate_subreg, forward_propagate_and_simplify,
	forward_propagate_into, fwprop, fwprop_addr): Replaced struct
	df_ref * with df_ref.
	(use_killed_between, all_uses_available_at): Use macros to hide
	references to df_refs.
	* regstat.c (regstat_bb_compute_ri,
	regstat_bb_compute_calls_crossed): Replaced struct df_ref * with
	df_ref.
	* see.c (see_handle_relevant_defs, see_handle_relevant_uses,
	see_handle_relevant_refs, see_analyze_one_def,
	see_update_relevancy, see_propagate_extensions_to_uses): Replaced
	struct df_ref * with df_ref.
	* ra-conflict.c (record_one_conflict, clear_reg_in_live,
	global_conflicts): Replaced struct df_ref * with df_ref.
	* ddg.c (create_ddg_dep_from_intra_loop_link,
	add_cross_iteration_register_deps, build_inter_loop_deps):
	Replaced struct df_ref * with df_ref.
	(create_ddg_dep_from_intra_loop_link,
	add_cross_iteration_register_deps): Use macros to hide references
	to df_refs.
	* auto-inc-dec.c (find_inc, merge_in_block): Replaced struct
	df_ref * with df_ref.
	* df-core.c (df_bb_regno_first_def_find,
	df_bb_regno_last_def_find, df_find_def, df_find_use,
	df_refs_chain_dump, df_regs_chain_dump, df_ref_debug,
	debug_df_ref): Replaced struct df_ref * with df_ref.
	(df_mws_dump, df_ref_debug): Use macros to hide references to
	df_refs.
	* cse.c (cse_extended_basic_block): Replaced struct df_ref * with
	df_ref.
	* web.c (union_defs, entry_register, replace_ref, web_main):
	Replaced struct df_ref * with df_ref.
	(union_defs, replace_ref): Use macros to hide references to
	df_refs.
	* global.c (compute_regs_asm_clobbered, build_insn_chain):
	Replaced struct df_ref * with df_ref.
	* ifcvt.c (dead_or_predicable): Replaced struct df_ref * with
	df_ref.
	* sel-sched-ir.c (maybe_downgrade_id_to_use, setup_id_reg_sets, ):
	Replaced struct df_ref * with df_ref.
	* ira-lives.c (mark_ref_live, def_conflicts_with_inputs_p,
	mark_ref_dead, process_bb_node_lives): Replaced struct df_ref *
	with df_ref.
	* local-alloc.c (block_alloc): Replaced struct df_ref * with
	df_ref.
	* df-byte-scan.c (df_compute_accessed_bytes_extract,
	df_compute_accessed_bytes_strict_low_part,
	df_compute_accessed_bytes_subreg, df_compute_accessed_bytes):
	Replaced struct df_ref * with df_ref.
	(df_compute_accessed_bytes): Use macros to hide references to
	df_refs.
	* init-regs.c (initialize_uninitialized_regs): Replaced struct
	df_ref * with df_ref.
	* loop-invariant.c (invariant_for_use, hash_invariant_expr_1,
	check_dependency, check_dependencies, record_uses): Replaced
	struct df_ref * with df_ref.
	(invariant_for_use, check_dependency): Use macros to hide
	references to df_refs.
	* loop-iv.c (iv_analysis_loop_init, iv_get_reaching_def,
	get_biv_step_1, get_biv_step, record_iv, iv_analyze_def,
	iv_analyze, biv_p): Replaced struct df_ref * with df_ref.
	(iv_analysis_loop_init, iv_get_reaching_def): Use macros to hide
	references to df_refs.
	* ira.c (compute_regs_asm_clobbered): Replaced struct df_ref * with df_ref.
	* combine.c (create_log_links): Replaced struct df_ref * with df_ref.
	* df-problems.c (df_rd_bb_local_compute_process_def,
	df_lr_bb_local_compute, df_live_bb_local_compute, df_chain_create,
	df_chain_unlink_1, df_chain_unlink, df_chain_copy,
	df_chain_remove_problem, df_chain_create_bb_process_use,
	df_chain_create_bb, df_chain_top_dump, df_chain_bottom_dump,
	df_byte_lr_check_regs, df_byte_lr_bb_local_compute,
	df_byte_lr_simulate_defs, df_byte_lr_simulate_uses,
	df_byte_lr_simulate_artificial_refs_at_top,
	df_byte_lr_simulate_artificial_refs_at_end, df_create_unused_note,
	df_note_bb_compute, df_note_add_problem, df_simulate_defs,
	df_simulate_uses, df_simulate_artificial_refs_at_end,
	df_simulate_artificial_refs_at_top): Replaced struct df_ref * with df_ref.
	(df_chain_dump): Use macros to hide
	references to df_refs.
	* config/mips/mips.c (r10k_simplify_address): Replaced struct
	df_ref * with df_ref.
	* dce.c (mark_nonreg_stores, delete_corresponding_reg_eq_notes,
	mark_artificial_uses, mark_reg_dependencies,
	byte_dce_process_block): Replaced struct df_ref * with df_ref.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/auto-inc-dec.c
    trunk/gcc/combine.c
    trunk/gcc/config/mips/mips.c
    trunk/gcc/cse.c
    trunk/gcc/dce.c
    trunk/gcc/ddg.c
    trunk/gcc/df-byte-scan.c
    trunk/gcc/df-core.c
    trunk/gcc/df-problems.c
    trunk/gcc/df-scan.c
    trunk/gcc/df.h
    trunk/gcc/fwprop.c
    trunk/gcc/global.c
    trunk/gcc/ifcvt.c
    trunk/gcc/init-regs.c
    trunk/gcc/ira-lives.c
    trunk/gcc/ira.c
    trunk/gcc/local-alloc.c
    trunk/gcc/loop-invariant.c
    trunk/gcc/loop-iv.c
    trunk/gcc/ra-conflict.c
    trunk/gcc/regstat.c
    trunk/gcc/see.c
    trunk/gcc/sel-sched-ir.c
    trunk/gcc/web.c

Comment 27 Richard Biener 2009-01-24 10:20:48 UTC
GCC 4.3.3 is being released, adjusting target milestone.
Comment 28 Jan Hubicka 2009-03-29 18:26:04 UTC
The testcase is now failing on mainline.  Reason seems to be new pure-const pass causing PRE to take unacceptable amount of time producing extraordinarily large bitmaps.  I stoppped it after while and out of 1GB of memory use, 700MB was PRE bitmap.

Honza
Comment 29 Jan Hubicka 2009-03-29 18:31:06 UTC
tree-ssa-pre.c:4201 (init_pre)                68  544319360       2960       2920   35303438
tree-ssa-pre.c:590 (bitmap_set_new)       187851  167503080   38440200   12734360  183944666
tree-ssa-pre.c:591 (bitmap_set_new)       187851   96183960   26228960   14024680  223113273
Comment 30 Richard Biener 2009-08-04 12:29:29 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 31 Richard Biener 2010-01-26 11:35:27 UTC
Updated timings and memory:

> ~/bin/maxmem2.sh /usr/bin/time gcc-4.5 -S -o /dev/null lgwam.c
32.62user 1.48system 0:34.41elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+333822minor)pagefaults 0swaps
total: 1333745 kB

> ~/bin/maxmem2.sh /usr/bin/time /space/rguenther/install/gcc-4.4.3/bin/gcc -S -o /dev/null lgwam.c
35.01user 1.54system 0:36.89elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+329139minor)pagefaults 0swaps
total: 1306898 kB

> ~/bin/maxmem2.sh /usr/bin/time /space/rguenther/install/gcc-4.3.4/bin/gcc -S -o /dev/null lgwam.c
27.42user 1.83system 0:29.61elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
16inputs+0outputs (0major+374338minor)pagefaults 0swaps
total: 1341721 kB

> ~/bin/maxmem2.sh /usr/bin/time /space/rguenther/install/gcc-4.2.3/bin/gcc -S -o /dev/null lgwam.c
15.33user 0.80system 0:16.31elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
256inputs+0outputs (2major+197427minor)pagefaults 0swaps
total: 598733 kB


time-report for 4.5 trunk (only interesting parts):

 expand                :   6.72 (20%) usr   0.72 (26%) sys   7.45 (20%) wall  140609 kB (35%) ggc
 integrated RA         :   8.56 (25%) usr   0.12 ( 4%) sys   8.76 (24%) wall    5151 kB ( 1%) ggc
 reload                :   7.72 (23%) usr   0.22 ( 8%) sys   7.94 (21%) wall    
 TOTAL                 :  34.17             2.79            37.15             402913 kB

memory-usage is still high compared to 4.2.

At -O2 the picture is similar (memory peaks at 2.5GB):

Execution times (seconds)
 garbage collection    :   0.64 ( 0%) usr   0.00 ( 0%) sys   0.64 ( 0%) wall       0 kB ( 0%) ggc
 callgraph construction:   0.31 ( 0%) usr   0.08 ( 1%) sys   0.39 ( 0%) wall   20525 kB ( 3%) ggc
 callgraph optimization:   1.40 ( 1%) usr   0.03 ( 0%) sys   1.46 ( 1%) wall     639 kB ( 0%) ggc
 ipa cp                :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall    9607 kB ( 1%) ggc
 ipa reference         :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const        :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.32 ( 0%) wall     199 kB ( 0%) ggc
 cfg cleanup           :   0.92 ( 0%) usr   0.00 ( 0%) sys   0.89 ( 0%) wall    1168 kB ( 0%) ggc
 trivially dead code   :   0.92 ( 0%) usr   0.00 ( 0%) sys   0.89 ( 0%) wall       0 kB ( 0%) ggc
 df multiple defs      :   0.86 ( 0%) usr   0.01 ( 0%) sys   0.88 ( 0%) wall       0 kB ( 0%) ggc
 df reaching defs      :   2.69 ( 1%) usr   0.96 (12%) sys   3.94 ( 2%) wall       0 kB ( 0%) ggc
 df live regs          :   7.15 ( 3%) usr   0.07 ( 1%) sys   7.13 ( 3%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   3.33 ( 2%) usr   0.06 ( 1%) sys   3.36 ( 1%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   0.85 ( 0%) usr   0.03 ( 0%) sys   0.88 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   6.31 ( 3%) usr   0.05 ( 1%) sys  12.84 ( 6%) wall   18222 kB ( 2%) ggc
 register information  :   3.42 ( 2%) usr   0.00 ( 0%) sys   3.46 ( 2%) wall       0 kB ( 0%) ggc
 alias analysis        :   1.14 ( 1%) usr   0.01 ( 0%) sys   1.17 ( 1%) wall   10447 kB ( 1%) ggc
 alias stmt walking    :   4.39 ( 2%) usr   0.26 ( 3%) sys   4.53 ( 2%) wall       0 kB ( 0%) ggc
 register scan         :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall       7 kB ( 0%) ggc
 rebuild jump labels   :   0.53 ( 0%) usr   0.00 ( 0%) sys   0.51 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing         :   0.64 ( 0%) usr   0.35 ( 4%) sys   0.93 ( 0%) wall   23140 kB ( 3%) ggc
 lexical analysis      :   0.30 ( 0%) usr   0.54 ( 7%) sys   0.88 ( 0%) wall       0 kB ( 0%) ggc
 parser                :   0.69 ( 0%) usr   0.38 ( 5%) sys   1.09 ( 0%) wall   38129 kB ( 5%) ggc
 inline heuristics     :   1.18 ( 1%) usr   0.01 ( 0%) sys   1.15 ( 1%) wall   29832 kB ( 4%) ggc
 integration           :   3.42 ( 2%) usr   0.45 ( 6%) sys   3.66 ( 2%) wall  175322 kB (22%) ggc
 tree gimplify         :   1.08 ( 1%) usr   0.09 ( 1%) sys   1.17 ( 1%) wall  104718 kB (13%) ggc
 tree eh               :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 tree CFG construction :   0.14 ( 0%) usr   0.01 ( 0%) sys   0.14 ( 0%) wall   11817 kB ( 1%) ggc
 tree CFG cleanup      :   1.23 ( 1%) usr   0.00 ( 0%) sys   1.21 ( 1%) wall      60 kB ( 0%) ggc
 tree VRP              :   3.08 ( 1%) usr   0.01 ( 0%) sys   3.09 ( 1%) wall    6719 kB ( 1%) ggc
 tree copy propagation :   1.22 ( 1%) usr   0.02 ( 0%) sys   1.14 ( 0%) wall     585 kB ( 0%) ggc
 tree find ref. vars   :   0.11 ( 0%) usr   0.01 ( 0%) sys   0.12 ( 0%) wall    6045 kB ( 1%) ggc
 tree PTA              :   0.62 ( 0%) usr   0.00 ( 0%) sys   0.68 ( 0%) wall    2178 kB ( 0%) ggc
 tree PHI insertion    :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     185 kB ( 0%) ggc
 tree SSA rewrite      :   2.92 ( 1%) usr   0.06 ( 1%) sys   3.04 ( 1%) wall   54148 kB ( 7%) ggc
 tree SSA other        :   0.24 ( 0%) usr   0.08 ( 1%) sys   0.34 ( 0%) wall     589 kB ( 0%) ggc
 tree SSA incremental  :   3.46 ( 2%) usr   0.03 ( 0%) sys   3.50 ( 2%) wall     187 kB ( 0%) ggc
 tree operand scan     :   1.81 ( 1%) usr   0.31 ( 4%) sys   2.40 ( 1%) wall   53424 kB ( 7%) ggc
 dominator optimization:   0.73 ( 0%) usr   0.00 ( 0%) sys   0.73 ( 0%) wall    5666 kB ( 1%) ggc
 tree SRA              :   0.01 ( 0%) usr   0.01 ( 0%) sys   0.02 ( 0%) wall       2 kB ( 0%) ggc
 tree CCP              :   0.95 ( 0%) usr   0.00 ( 0%) sys   0.93 ( 0%) wall     617 kB ( 0%) ggc
 tree PHI const/copy prop:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       8 kB ( 0%) ggc
 tree reassociation    :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall     280 kB ( 0%) ggc
 tree PRE              :   2.02 ( 1%) usr   1.01 (13%) sys   3.02 ( 1%) wall    2092 kB ( 0%) ggc
 tree FRE              :   2.62 ( 1%) usr   1.12 (14%) sys   3.75 ( 2%) wall    1674 kB ( 0%) ggc
 tree code sinking     :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall    1126 kB ( 0%) ggc
 tree linearize phis   :   0.07 ( 0%) usr   0.01 ( 0%) sys   0.09 ( 0%) wall       4 kB ( 0%) ggc
 tree forward propagate:   0.19 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall      75 kB ( 0%) ggc
 tree phiprop          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree conservative DCE :   0.61 ( 0%) usr   0.20 ( 2%) sys   0.91 ( 0%) wall       5 kB ( 0%) ggc
 tree aggressive DCE   :   0.65 ( 0%) usr   0.09 ( 1%) sys   0.80 ( 0%) wall    1486 kB ( 0%) ggc
 tree buildin call DCE :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE              :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall      34 kB ( 0%) ggc
 PHI merge             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      26 kB ( 0%) ggc
 tree loop bounds      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      17 kB ( 0%) ggc
 tree loop invariant motion:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       4 kB ( 0%) ggc
 scev constant prop    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     112 kB ( 0%) ggc
 complete unrolling    :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall     379 kB ( 0%) ggc
 tree iv optimization  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     473 kB ( 0%) ggc
 tree loop init        :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall     390 kB ( 0%) ggc
 tree copy headers     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     330 kB ( 0%) ggc
 tree SSA uncprop      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 tree rename SSA copies:   0.31 ( 0%) usr   0.00 ( 0%) sys   0.32 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation :   1.26 ( 1%) usr   0.04 ( 0%) sys   1.31 ( 1%) wall       0 kB ( 0%) ggc
 control dependences   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 expand                :  10.92 ( 5%) usr   0.70 ( 9%) sys  11.74 ( 5%) wall  124907 kB (16%) ggc
 forward prop          :   2.08 ( 1%) usr   0.04 ( 0%) sys   2.06 ( 1%) wall    3631 kB ( 0%) ggc
 CSE                   :   3.19 ( 1%) usr   0.00 ( 0%) sys   3.14 ( 1%) wall     512 kB ( 0%) ggc
 dead code elimination :   1.24 ( 1%) usr   0.01 ( 0%) sys   1.25 ( 1%) wall       0 kB ( 0%) ggc
 dead store elim1      :   1.26 ( 1%) usr   0.04 ( 0%) sys   1.33 ( 1%) wall     965 kB ( 0%) ggc
 dead store elim2      :   1.51 ( 1%) usr   0.02 ( 0%) sys   1.54 ( 1%) wall    7466 kB ( 1%) ggc
 loop analysis         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     229 kB ( 0%) ggc
 loop invariant motion :   0.10 ( 0%) usr   0.03 ( 0%) sys   0.13 ( 0%) wall       2 kB ( 0%) ggc
 CPROP                 :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall     452 kB ( 0%) ggc
 PRE                   :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall      15 kB ( 0%) ggc
 CSE 2                 :   3.20 ( 1%) usr   0.00 ( 0%) sys   3.22 ( 1%) wall     494 kB ( 0%) ggc
 branch prediction     :   0.28 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) wall    2539 kB ( 0%) ggc
 combiner              :   1.42 ( 1%) usr   0.02 ( 0%) sys   1.42 ( 1%) wall   11673 kB ( 1%) ggc
 if-conversion         :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall     338 kB ( 0%) ggc
 regmove               :   0.39 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) wall       0 kB ( 0%) ggc
 integrated RA         :  78.17 (37%) usr   0.46 ( 6%) sys  78.94 (34%) wall    4788 kB ( 1%) ggc
 reload                :  25.79 (12%) usr   0.07 ( 1%) sys  26.09 (11%) wall   26499 kB ( 3%) ggc
 reload CSE regs       :   3.00 ( 1%) usr   0.05 ( 1%) sys   3.08 ( 1%) wall   15550 kB ( 2%) ggc
 thread pro- & epilogue:   1.39 ( 1%) usr   0.01 ( 0%) sys   1.48 ( 1%) wall    1006 kB ( 0%) ggc
 if-conversion 2       :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall     138 kB ( 0%) ggc
 combine stack adjustments:   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 peephole 2            :   0.67 ( 0%) usr   0.01 ( 0%) sys   0.69 ( 0%) wall    3275 kB ( 0%) ggc
 hard reg cprop        :   1.33 ( 1%) usr   0.02 ( 0%) sys   1.32 ( 1%) wall     979 kB ( 0%) ggc
 scheduling 2          :   6.59 ( 3%) usr   0.05 ( 1%) sys   6.63 ( 3%) wall     294 kB ( 0%) ggc
 machine dep reorg     :   0.69 ( 0%) usr   0.01 ( 0%) sys   0.68 ( 0%) wall      10 kB ( 0%) ggc
 reorder blocks        :   0.34 ( 0%) usr   0.00 ( 0%) sys   0.39 ( 0%) wall    2878 kB ( 0%) ggc
 final                 :   1.63 ( 1%) usr   0.04 ( 0%) sys   1.64 ( 1%) wall     201 kB ( 0%) ggc
 tree if-combine       :   0.01 ( 0%) usr   0.01 ( 0%) sys   0.03 ( 0%) wall       1 kB ( 0%) ggc
 plugin execution      :   0.00 ( 0%) usr   0.03 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 : 214.12             8.02           229.70             793292 kB

At -O1 we seem to never finish early inlining into testsuite though...
(the loop over all callees calling check-inline-limits which again
loops over all callees looks quadratic - at least because we do not
not consider the duplicates in the caller?  We seem to be looping
over edges but check limits for !one_only.  Why do we not consider
edges individually and avoid calling cgraph_check_inline_limits
with !one_only at all?).
Comment 32 Richard Biener 2010-01-26 11:44:57 UTC
Finally it finished.  -O1:

 inline heuristics     : 810.76 (63%) usr   4.58 (24%) sys 957.16 (53%) wall       0 kB ( 0%) ggc
 integrated RA         : 115.69 ( 9%) usr   3.86 (20%) sys 290.09 (16%) wall    5979 kB ( 1%) ggc
 reload                : 312.19 (24%) usr   4.99 (26%) sys 453.32 (25%) wall    5139 kB ( 1%) ggc
 TOTAL                 :1290.20            19.28          1807.46             486943 kB

ugh.
Comment 33 Richard Biener 2010-01-26 12:33:58 UTC
The whole early-inlining stuff is made ugly because we jump through hoops
to handle callgraph cycles where some callees may not yet be in SSA form.
If we do not want to go the route to go into SSA for all functions before
starting early IPA passes then we at least should do so for callgraph SCCs.
Comment 34 Jan Hubicka 2010-01-26 16:05:29 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] gcc 4.3.1 cannot compile big function

> The whole early-inlining stuff is made ugly because we jump through hoops
> to handle callgraph cycles where some callees may not yet be in SSA form.

Hmm, it don't seem to me that keeping the reverse postorder is bad idea
here (not that going to SSA earlier would not be good idea for different
reasons).  Still top-down inliner needs to deal with cycles one way or
another.  I will take a look what is taking quadratic time.
Comment 35 Richard Biener 2010-01-28 16:21:08 UTC
I have a fix for the inline-heuristic slowness.
Comment 36 Richard Biener 2010-01-28 16:42:26 UTC
With the patch:

 inline heuristics     :   0.77 ( 0%) usr   0.01 ( 0%) sys   1.37 ( 0%) wall       0 kB ( 0%) ggc
 expand                :   9.42 ( 2%) usr   0.70 (11%) sys  10.36 ( 2%) wall  138780 kB (29%) ggc
 integrated RA         : 115.01 (24%) usr   1.33 (21%) sys 117.04 (24%) wall    5975 kB ( 1%) ggc
 reload                : 311.26 (65%) usr   0.44 ( 7%) sys 312.77 (64%) wall    5139 kB ( 1%) ggc
 TOTAL                 : 478.22             6.28           491.79             486545 kB
Comment 37 Richard Biener 2010-01-28 16:50:23 UTC
But it miscompares.  Honza?!

Index: gcc/ipa-inline.c
===================================================================
*** gcc/ipa-inline.c    (revision 156321)
--- gcc/ipa-inline.c    (working copy)
*************** cgraph_decide_inlining_incrementally (st
*** 1510,1515 ****
--- 1510,1517 ----
        /* Never inline regular functions into always-inline functions
         during incremental inlining.  */
        && !node->local.disregard_inline_limits)
+     {
+       bitmap visited = BITMAP_ALLOC (NULL);
      for (e = node->callees; e; e = e->next_callee)
        {
          int allowed_growth = 0;
*************** cgraph_decide_inlining_incrementally (st
*** 1517,1522 ****
--- 1519,1528 ----
            || !e->inline_failed
            || e->callee->local.disregard_inline_limits)
          continue;
+       /* We are inlining a function to all call-sites in node
+          or to none.  So visit each candidate only once.  */
+       if (!bitmap_set_bit (visited, e->callee->uid))
+         continue;
        if (dump_file)
          fprintf (dump_file, "Considering inline candidate %s.\n",
                   cgraph_node_name (e->callee));
*************** cgraph_decide_inlining_incrementally (st
*** 1601,1606 ****
--- 1607,1614 ----
        if (cgraph_default_inline_p (e->callee, &failed_reason))
          inlined |= try_inline (e, mode, depth);
        }
+     BITMAP_FREE (visited);
+     }
    node->aux = (void *)(size_t) old_mode;
    return inlined;
  }
Comment 38 Richard Biener 2010-01-28 16:58:03 UTC
Nope, bootstrap is generally broken.
Comment 39 Richard Biener 2010-01-29 11:26:39 UTC
Subject: Bug 37448

Author: rguenth
Date: Fri Jan 29 11:26:27 2010
New Revision: 156343

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156343
Log:
2010-01-29  Richard Guenther  <rguenther@suse.de>

	PR middle-end/37448
	* ipa-inline.c (cgraph_decide_inlining_incrementally): Avoid
	quadratic behavior in most cases.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ipa-inline.c

Comment 40 Jan Hubicka 2010-01-29 20:54:36 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] gcc 4.3.1 cannot compile big function

> But it miscompares.  Honza?!
Patch seems sane to me, so I guess it was other bootstrap issue.
I was actually working on similar fix yesterday just in more general
setting.  Similar issue can be triggered with main inliner too, that
might provoke need to later make this caching global, but lets wait for
testcase since there might be other issues if you get that many inline
candidates and inlined calls.

Thanks for looking into this,
Honza
Comment 41 Richard Biener 2010-05-22 18:12:38 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 42 Richard Biener 2011-01-18 14:28:00 UTC
PR47344 now tracks the regression property of this bug.
Comment 43 Jan Hubicka 2011-01-18 14:53:46 UTC
Now at -O2 the main inliner seems to degenerate ;(
168767   78.4974  cc1                      cgraph_edge_badness
9426      4.3842  cc1                      update_edge_key
3529      1.6414  cc1                      update_caller_keys
2498      1.1619  cc1                      bitmap_set_bit
1776      0.8261  opreport                 /usr/bin/opreport
1212      0.5637  cc1                      bitmap_elt_insert_after
1166      0.5423  oprofiled                /usr/bin/oprofiled
1106      0.5144  libc-2.11.1.so           _IO_vfscanf
965       0.4488  cc1                      bitmap_ior_into
Comment 44 Jan Hubicka 2011-01-18 14:58:04 UTC
and later IRA
samples  %        app name                 symbol name
25826    56.8667  cc1                      allocno_spill_priority_compare
6812     14.9994  cc1                      ira_build_conflicts
4308      9.4859  cc1                      splay_tree_splay
2847      6.2689  cc1                      color_pass
1661      3.6574  cc1                      push_allocno_to_stack
1336      2.9418  cc1                      build_object_conflicts
468       1.0305  cc1                      bitmap_bit_p
443       0.9754  cc1                      splay_tree_insert

df reaching defs      :   1.65 ( 1%) usr   0.42 (10%) sys   2.10 ( 1%) wall       0 kB ( 0%) ggc
 df live regs          :   4.27 ( 2%) usr   0.00 ( 0%) sys   4.24 ( 2%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   1.31 ( 1%) usr   0.01 ( 0%) sys   1.32 ( 1%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   0.72 ( 0%) usr   0.00 ( 0%) sys   0.72 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   4.32 ( 2%) usr   0.00 ( 0%) sys   4.31 ( 2%) wall    8508 kB ( 1%) ggc
 register information  :   2.05 ( 1%) usr   0.00 ( 0%) sys   2.02 ( 1%) wall       0 kB ( 0%) ggc

362       0.7971  cc1                      splay_tree_remove
193       0.4250  cc1                      splay_tree_free
167       0.3677  cc1                      add_copies


 inline heuristics     :  81.88 (35%) usr   0.04 ( 1%) sys  82.07 (35%) wall   44238 kB ( 5%) ggc
 CSE                   :   4.09 ( 2%) usr   0.00 ( 0%) sys   4.12 ( 2%) wall     138 kB ( 0%) ggc
 CSE 2                 :   4.13 ( 2%) usr   0.00 ( 0%) sys   4.12 ( 2%) wall     125 kB ( 0%) ggc
 integrated RA         :  50.18 (22%) usr   0.18 ( 4%) sys  50.48 (21%) wall    7472 kB ( 1%) ggc
 reload                :   9.21 ( 4%) usr   0.00 ( 0%) sys   9.25 ( 4%) wall   12180 kB ( 1%) ggc
 scheduling 2          :   3.81 ( 2%) usr   0.00 ( 0%) sys   3.79 ( 2%) wall     356 kB ( 0%) ggc

So mostly IRA and main inliner.  I am not sure how much I can do for main inliner except for adding some hard limit on how much work we put into updating the priority queue and ignoring inconsistencies when it is met...

Honza
Comment 45 Steven Bosscher 2012-10-25 23:51:28 UTC
(In reply to comment #15)
> We seem to make nice progress on the testcase ;)

I wonder where things stand with trunk today..
Comment 46 lucier 2012-10-26 02:05:13 UTC
Created attachment 28534 [details]
memory and time statistics for compiling lgwam.c

With today's mainline

leibniz-252% /tmp/lucier/mainline/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/tmp/lucier/mainline/bin/gcc
COLLECT_LTO_WRAPPER=/tmp/lucier/mainline/libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../../gcc/configure --prefix=/tmp/lucier/mainline --enable-checking=release --enable-languages=c --disable-multilib
Thread model: posix
gcc version 4.8.0 20121026 (experimental) (GCC) 

and the command:

leibniz-270% /tmp/lucier/mainline/bin/gcc -O3 -c lgwam.c -fmem-report -ftime-report > & ! lgwam.c.stat-O3

These are the statistics.

Brad
Comment 47 Jan Hubicka 2012-10-26 15:22:50 UTC
Hmm, good timming. I just started looking into re-tunning inliner heuristics for 4.8 so I should look again into this inline bomb... Looks like inliner did not get faster ;)
Comment 48 Jan Hubicka 2012-10-26 16:48:08 UTC
The problem here is walking of callgraph edges to sum the code size of the caller function in both early inliner and late inliner. I am still not very keen on making the cost calucuations incremental, but I will see if keeping the sum of all edges makes sense here...

Honza
Comment 49 Steven Bosscher 2012-10-26 17:28:58 UTC
(In reply to comment #47)
> Hmm, good timming. I just started looking into re-tunning inliner 
> heuristics for 4.8 so I should look again into this inline bomb... 
> Looks like inliner did not get faster ;)

While everything else did get faster. Much of the work from PR54146 also
helps for this PR. But the inliner heuristics are even more insane than
this test case.

What are the reasons for your aversion against incrementally updating
the inliner heuristics?
Comment 50 Steven Bosscher 2013-03-06 10:42:20 UTC
Note this is a regression.
Comment 51 Richard Biener 2013-03-06 10:45:50 UTC
See comment #42 ...