43058 – [4.5 Regression] var-tracking uses up all virtual memory

Bug 43058 - [4.5 Regression] var-tracking uses up all virtual memory

Summary: [4.5 Regression] var-tracking uses up all virtual memory

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.5.0

Importance:	P1 normal
Target Milestone:	4.5.0
Assignee:	Jakub Jelinek

URL:
Keywords:	memory-hog

Depends on:	41371
Blocks:
	Show dependency tree / graph

Reported:	2010-02-13 19:04 UTC by Richard Biener
Modified:	2010-03-23 06:48 UTC (History)
CC List:	5 users (show)

See Also:
Host:
Target:	i?86-linux
Build:
Known to work:
Known to fail:
Last reconfirmed:	2010-03-12 12:06:01

Attachments
testcase from snd (455.28 KB, application/octet-stream) 2010-02-13 19:06 UTC, Richard Biener	Details
gcc45-pr43058.patch (1.60 KB, patch) 2010-03-17 15:05 UTC, Jakub Jelinek	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Biener 2010-02-13 19:04:17 UTC

/usr/lib64/gcc/x86_64-suse-linux/4.5/cc1 -m32 -fpreprocessed xg.i -quiet -dumpbase xg.c -mtune=generic -march=i586 -auxbase xg -g -O2 -O2 -Wall -version -fomit-frame-pointer -fmessage-length=0 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fno-strict-aliasing -o xg.s --param ggc-min-expand=100 --param ggc-min-heapsize=131072

uses more than 2GB of virtual memory when on a 32bit host (and thus fails
to compile).  On a 64bit host we top at >3GB (I need to re-try on a host
with more memory, I'm swapping to death right now).

Related to PR41371, the offender is var-tracking.  Without -g we top
at 600MB on the 64bit host.

I suppose we inline all single-call static functions into Init_libxg
which makes it biiiig (and containing lots of calls).

Comment 1 Richard Biener 2010-02-13 19:06:31 UTC

Created attachment 19859 [details]
testcase from snd

Testcase from snd.

Comment 2 Richard Biener 2010-02-13 19:12:58 UTC

Indeed.  After IPA-inline we have:

Init_libxg/4659(4659) @0x7fffef076138 availability:available 65 time, 10 benefit (112338 after inlining) 37 size, 1 benefit (89974 after inlining) needed reachable body externally_visible finalized inlinable
  called by:
  calls: scm_c_eval_string/4663 (0.39 per call) scm_c_eval_string/4663 (0.39 per call) scm_c_eval_string/4663 (0.39 per call) scm_c_define/4662 (0.39 per call) scm_from_locale_string/4661 (0.39 per call) scm_add_feature/4660 (0.39 per call) define_strings/4658 (inlined) (0.39 per call) define_atoms/4657 (0.39 per call) define_structs/4654 (0.39 per call) define_functions/4653 (inlined) (0.39 per call) define_doubles/4656 (0.39 per call) define_integers/4655 (inlined) (0.39 per call) scm_set_smob_free/4665 (0.39 per call) scm_make_smob_type/4664 (0.39 per call)

And the function-called once dump looks odd to me:

Deciding on functions called once:

Considering define_strings size 840.
 Called once from Init_libxg 37 insns.
 Inlined into Init_libxg which now has 876 size for a net change of -841 size.

Considering define_integers size 11078.
 Called once from Init_libxg 876 insns.
 Inlined into Init_libxg which now has 11953 size for a net change of -11079 size.

I can't believe on this net change.

Considering define_doubles size 56.
 Called once from Init_libxg 11953 insns.
 Not inlining: --param large-function-growth limit reached.

Considering define_functions size 78022.
 Called once from Init_libxg 11953 insns.
 Inlined into Init_libxg which now has 89974 size for a net change of -78023 size.

what?  We refused to inline define_doubles but inline define_functions??

Considering define_atoms size 119.
 Called once from Init_libxg 89974 insns.
 Not inlining: --param large-function-growth limit reached.

Considering define_structs size 757.
 Called once from Init_libxg 89974 insns.
 Not inlining: --param large-function-growth limit reached.


Very very odd.  Honza?

Comment 3 Richard Biener 2010-02-13 19:25:08 UTC

  limit += limit * PARAM_VALUE (PARAM_LARGE_FUNCTION_GROWTH) / 100;

  /* Check the size after inlining against the function limits.  But allow
     the function to shrink if it went over the limits by forced inlining.  */
  newsize = cgraph_estimate_size_after_inlining (times, to, what);
  if (newsize >= to->global.size
      && newsize > PARAM_VALUE (PARAM_LARGE_FUNCTION_INSNS)
      && newsize > limit)
    {

this allows unbounded growth based on PARAM_LARGE_FUNCTION_GROWTH, ignoring
PARAM_LARGE_FUNCTION_INSNS.  Shouldn't we at least limit

  /* When inlining large function body called once into small function,
     take the inlined function as base for limiting the growth.  */
  if (inline_summary (to)->self_size > inline_summary(what)->self_size)
    limit = inline_summary (to)->self_size;
  else
    limit = inline_summary (what)->self_size;

  limit += limit * PARAM_VALUE (PARAM_LARGE_FUNCTION_GROWTH) / 100;

so that limit is at most PARM_LARGE_FUNCTION_INSNS * PARM_LARGE_FUNCTION_GROWTH
/ 100?

Comment 4 Richard Biener 2010-02-13 19:38:25 UTC

With that we get the much more sane

Deciding on functions called once:

Considering define_strings size 840.
 Called once from Init_libxg 37 insns.
 Inlined into Init_libxg which now has 876 size for a net change of -841 size.

Considering define_integers size 11078.
 Called once from Init_libxg 876 insns.
 Not inlining: --param large-function-growth limit reached.

Considering define_doubles size 56.
 Called once from Init_libxg 876 insns.
 Inlined into Init_libxg which now has 931 size for a net change of -57 size.

Considering define_functions size 78022.
 Called once from Init_libxg 931 insns.
 Not inlining: --param large-function-growth limit reached.

Considering define_atoms size 119.
 Called once from Init_libxg 931 insns.
 Inlined into Init_libxg which now has 1049 size for a net change of -120 size.

Considering define_structs size 757.
 Called once from Init_libxg 1049 insns.
 Inlined into Init_libxg which now has 1805 size for a net change of -758 size.



The problem with the present code is that as long as the function we want
to inline is LARGE_FUNCTION_GROWTH percent bigger than the caller
we will continue to grow the caller up to infinite size.

Comment 5 Richard Biener 2010-02-13 19:56:53 UTC

Of course we'll blow up during var-tracking from within define_functions anyway.

Comment 6 Richard Biener 2010-02-15 16:57:06 UTC

The following doesn't make too much sense:

static bool
cgraph_mark_inline_edge (struct cgraph_edge *e, bool update_original,
                         VEC (cgraph_edge_p, heap) **new_edges)
{
...
  /* Now update size of caller and all functions caller is inlined into.  */
  for (;e && !e->inline_failed; e = e->caller->callers)
    {
      to = e->caller;
      old_size = e->caller->global.size;
      new_size = cgraph_estimate_size_after_inlining (1, to, what);
      to->global.size = new_size;
      to->global.time = cgraph_estimate_time_after_inlining (freq, to, what);
    }
  gcc_assert (what->global.inlined_to == to);
  if (new_size > old_size)
    overall_size += new_size - old_size;

so we adjust inlined callers size but do not accumulate those changes to
overall_size.  And in ...

static bool
cgraph_check_inline_limits (struct cgraph_node *to, struct cgraph_node *what,
                            cgraph_inline_failed_t *reason, bool one_only)
{
...
  if (to->global.inlined_to)
    to = to->global.inlined_to;

we seem to adjust for this effect by taking to->global.inlined_to, but that's
obviously not the same.

Also when deciding function called once inlining we should use
cgraph_check_inline_limits (..., true) and cgraph_mark_inline_edge.

Comment 7 Mark Mitchell 2010-02-17 16:56:50 UTC

I think this is a critical problem.  If var-tracking is causing factor-of-N increases in memory usage, then we need an algorithmic change that prevents that, even if that means inferior debug information.  We're not going to win friends and influence people by giving them a compiler that can't compile things on their system, or takes much longer to do it, even if their debugging experience is better.

Comment 8 Richard Biener 2010-02-19 11:55:28 UTC

The recent patch to add PARAM_MAX_VARTRACK_SIZE didn't fix this.  We still
top out beyond what my machine with 3GB ram and 1GB swap can handle.

GCC 4.4.3 tops at 620MB
GCC 4.5 with -fno-var-tracking-assignments tops at 610MB, same with
-fno-var-tracking

Honza, if we implement partial inlining, can we implement function splitting? :)

Comment 9 Jakub Jelinek 2010-02-24 18:25:27 UTC

It is vt_emit_notes that eats all the memory, not vt_find_locations, and the memory doesn't go into the hash tables, but for rtx allocation.
I've printed ggc statistics on vt_emit_notes entry and after I left it eat an extra GB or so.  The differences are primarily in 24, 32 and 64 byte GGC allocations, jumping from 16 to 250MB, 8 to 212 MB and 5 to 417 MB respectively for 24/32/64 byte orders, and get_max_uid () jumped from ~ 92000 to 6888627,
which means all the memory is eaten by millions of generated var_location notes.
Haven't investigated yet why they are created in such a huge volume that often.

Comment 10 Jakub Jelinek 2010-03-05 15:19:00 UTC

I was hoping PR43176 caching patch would actually fix this too, but apparently that's not the case, because the problem here is not that we are emitting useless notes, but that the notes for hundreds of `a' variables (different ones, the source has many of them) keep alternating between (symbol_ref ".LCNNNN") and (mem (sp)).  The question is if those really live at that memory and are clobbered all the time (could be related to PR43051 and PR43092 too) and why they are alternating.

Comment 11 Richard Biener 2010-03-12 12:06:01 UTC

re-confirmed with r157384.

Comment 12 Jakub Jelinek 2010-03-17 10:58:01 UTC

Reduced testcase at -g -O2 on x86_64 grows during var-tracking get_max_uid ()
from ~ 22000 to over a 1000000.  Adding ten X4's instead of 1 of course eats even more memory.

extern void *f1 (void *, void *, void *);
extern void *f2 (const char *, int, int, int, void *(*) ());
extern void *f3 (const char *);
extern void *f4 (void *s);
extern void *f5 (void *);

void test (void)
{
#define X1 f1 (f2 ("a", 1, 0, 0, f5), \
       f4 (({ const char *a = "b"; f3 (a); })), \
       ({ const char *a = "c"; f3 (a); }));
#define X2 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1
#define X3 X2 X2 X2 X2 X2 X2 X2 X2 X2 X2
#define X4 X3 X3 X3 X3 X3 X3 X3 X3 X3 X3
  X4
}

Comment 13 Jakub Jelinek 2010-03-17 15:05:22 UTC

Created attachment 20130 [details]
gcc45-pr43058.patch

So far untested fix.  This just optimizes handling of optimized out variables which are known to be constant (in some part of the code or whole function).
We don't need to change the location list every time the constant value is assigned to some register or memory.  That's unlike vars that actually live in some register or memory at some point, there of course we want to have the location for the register/memory so that the debugger can change it.
The testcase keeps pushing .LC0 resp. LC1 into some register (or MEM slot) and then every call actually clobbers that reg resp. MEM slot, so after every set of the reg resp. mem var-tracking was generating up to 1000 var_location notes for every a decl with that value, then in the middle of every call another up to 1000 var_location notes that the var now is constant and doesn't live in the reg or mem.

While it is IMHO desirable to do what this patch does in any case, I'll try to come up with another solution which would try to keep the location list less fragmented.

Comment 14 Jakub Jelinek 2010-03-18 20:15:19 UTC

Subject: Bug 43058

Author: jakub
Date: Thu Mar 18 20:15:05 2010
New Revision: 157547

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157547
Log:
	PR debug/43058
	* var-tracking.c (non_suitable_const): New function.
	(add_uses): For DEBUG_INSNs with constants, don't record any
	value, instead just the constant value itself.
	(compute_bb_dataflow) <case MO_VAL_LOC>: If PAT_VAR_LOCATION_LOC
	is not VAR_LOC_UNKNOWN_P, set var to the constant.
	(emit_notes_in_bb): Likewise.
	(emit_note_insn_var_location): For onepart variables if
	cur_loc is a VOIDmode constant, use DECL_MODE.

	* gcc.dg/pr43058.c: New test.

Added:
    trunk/gcc/testsuite/gcc.dg/pr43058.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/var-tracking.c

Comment 15 Jakub Jelinek 2010-03-18 20:30:52 UTC

Fixed.

Comment 16 Jim Wilson 2010-03-23 00:58:03 UTC

The testcase checked into mainline is causing kernel panics on my debian testing ia64-linux machine when I run the gcc testsuite.  The kernel panic is coming from the out-of-memory killer, when it runs out of processes to kill.  I have 2GB main memory and 2GB swap.  I tried a x86-linux hosted cross compiler to ia64-linux, and I see the cc1 process uses 3GB before the kernel kills it.  I suspect a 32-bit x86-linux process can't use more than 3GB.  I don't know how much memory is needed for this testcase, but it is clearly too much.

ia64-linux isn't a primary target, so this is maybe not P1 if only ia64-linux is still broken.

Comment 17 Jakub Jelinek 2010-03-23 06:48:51 UTC

But clearly it is not var-tracking that eats all the memory, instead it is the scheduler, and it happens also with -g0, and doesn't happen with -fno-schedule-insns -fno-schedule-insns2.  So, please open a separate bugreport about it, reopening this one for unrelated reasons will just lead to confusion.  I guess I could add { target { ! "ia64-*-*" } } as a quick workaround.