User account creation filtered due to spam.

Bug 41371 - [4.5/4.6 Regression] var-tracking is slow and memory hungry
Summary: [4.5/4.6 Regression] var-tracking is slow and memory hungry
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: debug (show other bugs)
Version: 4.5.0
: P2 normal
Target Milestone: 4.5.1
Assignee: Alexandre Oliva
URL:
Keywords: compile-time-hog, memory-hog
: 41264 42565 42911 43776 (view as bug list)
Depends on:
Blocks: 42911 43058
  Show dependency treegraph
 
Reported: 2009-09-16 09:18 UTC by David Binderman
Modified: 2010-06-09 04:56 UTC (History)
6 users (show)

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2010-05-16 10:49:29


Attachments
C++ source code (122.64 KB, text/plain)
2009-09-16 09:19 UTC, David Binderman
Details
qmc2main.ii.bz2 (369.55 KB, application/x-bzip2)
2009-11-04 15:34 UTC, Jakub Jelinek
Details
vta-limit-on-size.patch (2.64 KB, patch)
2010-01-05 22:30 UTC, Jakub Jelinek
Details | Diff
gcc45-pr41371.patch (1.03 KB, patch)
2010-01-12 17:00 UTC, Jakub Jelinek
Details | Diff
testcase from kdesdk/umbrello (90.59 KB, application/x-bzip)
2010-02-16 12:47 UTC, Lubos Lunak
Details
gcc46-pr41371.patch (699 bytes, patch)
2010-05-25 10:39 UTC, Jakub Jelinek
Details | Diff
rh598310.i.bz2 (156.34 KB, application/octet-stream)
2010-06-04 09:02 UTC, Jakub Jelinek
Details
limit-depth.patch (418 bytes, patch)
2010-06-04 09:24 UTC, Jakub Jelinek
Details | Diff
hack (880 bytes, patch)
2010-06-04 09:30 UTC, Jakub Jelinek
Details | Diff
Patch that fixes the bug and verifies recursion is bounded as expected (1.19 KB, patch)
2010-06-04 10:03 UTC, Alexandre Oliva
Details | Diff
bnc611650 (81.92 KB, application/octet-stream)
2010-06-04 11:03 UTC, Richard Biener
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Binderman 2009-09-16 09:18:18 UTC
I just tried to compile the Suse Linux package krename-3.9.3-1.33
with the gcc 4.5 mainline snapshot 20090910
and the compiler appeared to hang for over half an hour on
an otherwise idle machine.

Preprocessed C++ source attached. Flags -O2 -g -m64 required.
Comment 1 David Binderman 2009-09-16 09:19:42 UTC
Created attachment 18594 [details]
C++ source code
Comment 2 Richard Biener 2009-09-16 09:54:02 UTC
it's var-tracking again.  Slowly eating memory and time.  Working very hard
here:

Run till exit from #0  vt_find_locations ()
    at /space/rguenther/src/svn/trunk/gcc/var-tracking.c:5467
Comment 3 Richard Biener 2009-10-18 18:00:22 UTC
Reconfirmed.
Comment 4 Jakub Jelinek 2009-11-04 15:34:48 UTC
Created attachment 18964 [details]
qmc2main.ii.bz2

Another testcase where var-tracking takes almost forever on setupUi on x86_64-linux with -g -O2.
Comment 5 David Binderman 2009-12-05 12:50:38 UTC
(In reply to comment #3)
> Reconfirmed.

It seems to be still broken in snapshot 20091203

Comment 6 Richard Biener 2009-12-13 22:18:37 UTC
*** Bug 41264 has been marked as a duplicate of this bug. ***
Comment 7 Richard Biener 2009-12-13 22:18:57 UTC
Another testcase is attached to PR41264.
Comment 8 Jakub Jelinek 2010-01-02 15:05:31 UTC
*** Bug 42565 has been marked as a duplicate of this bug. ***
Comment 9 Jakub Jelinek 2010-01-05 22:30:42 UTC
Created attachment 19478 [details]
vta-limit-on-size.patch

While Alex is busy working on speeding up var-tracking on these KDE testcases, here is a temporary workaround, just giving up on -fvar-tracking-assignments
if var-tracking would need too much memory/would take too long, and giving up
on -fvar-tracking altogether if even that didn't help.
5000000 is something I came up from playing with cc1files on x86_64-linux with -O2 -g or -O3 -g.  There is one generated routine in GCC which needs 10000000 (recog_35 in the cc1files I have), otherwise all of SPEC2K, MICO, FF3D, rest of GCC, TRAMP3D etc. need always below 5mil (just tramp3d and one routine in ff3d is approaching it).  Using 10mil is too much for the second testcase in this PR, had to break the build after 27 minutes.  But with the 5mil limit it was quite quick.
Comment 10 Jakub Jelinek 2010-01-11 18:23:06 UTC
I've looked briefly at the #c4 testcase, and the problem seems to be extremely long loc_chains.  var-tracking e.g. stops for huge amount of time (many minutes) on one bb, EH pad with 162 incoming EH edges (and no others).  Each of those 162 incoming EH edges has roughly 1800 vars in its out hash table, with just one problematic one - which has around 650 items in var->var_part[0].loc_chain chain.  There are ~ 2 other vars with loc_chain chain lengths in the 40s and the rest is with chain length below 10.  The CPU eater is intersect_loc_chains.
For each of the 650 loc_chain entries it calls find_loc_in_1pdv, which, as the vast majority of the entries in s2var's loc_chain are VALUEs, looks stuff up in the hash table and recurses.

I wonder whether for large loc_chain lengths we just couldn't use a temporary hash table.  If we see in intersect_loc_chains that chain length is beyond certain threshold, populate a temporary hash table by doing what find_loc_in_1pdv does (except instead of rtx_equal_p and returning early it seeing a match it would record each loc into the hash table and continue walking).  Then intersect_loc_chains could just walk this hash table instead of searching through it again and again, avoiding the O(loc_chain_length^2) complexity.
Comment 11 Richard Biener 2010-01-12 09:09:41 UTC
(In reply to comment #10)
> I've looked briefly at the #c4 testcase, and the problem seems to be extremely
> long loc_chains.  var-tracking e.g. stops for huge amount of time (many
> minutes) on one bb, EH pad with 162 incoming EH edges (and no others).  Each of
> those 162 incoming EH edges has roughly 1800 vars in its out hash table, with
> just one problematic one - which has around 650 items in
> var->var_part[0].loc_chain chain.  There are ~ 2 other vars with loc_chain
> chain lengths in the 40s and the rest is with chain length below 10.  The CPU
> eater is intersect_loc_chains.
> For each of the 650 loc_chain entries it calls find_loc_in_1pdv, which, as the
> vast majority of the entries in s2var's loc_chain are VALUEs, looks stuff up in
> the hash table and recurses.
> 
> I wonder whether for large loc_chain lengths we just couldn't use a temporary
> hash table.  If we see in intersect_loc_chains that chain length is beyond
> certain threshold, populate a temporary hash table by doing what
> find_loc_in_1pdv does (except instead of rtx_equal_p and returning early it
> seeing a match it would record each loc into the hash table and continue
> walking).  Then intersect_loc_chains could just walk this hash table instead of
> searching through it again and again, avoiding the O(loc_chain_length^2)
> complexity.

That sounds like a good idea.
Comment 12 Jakub Jelinek 2010-01-12 12:58:24 UTC
It is actually far worse on this testcase, as with all the recursions find_loc_in_1pdv does we actually must scan many values many times.
I've put a counter and on this bb one outermost find_loc_in_1pdv calls together rtx_equal_p roughly 880000 times, times 650.
So, perhaps even just not resetting VALUE_RECURSED_INTO back immediately, but in a second loop afterwards would be enough to make this manageable.
Comment 13 Jakub Jelinek 2010-01-12 17:00:29 UTC
Created attachment 19564 [details]
gcc45-pr41371.patch

Patch that speeds up the qmc2main.ii testcase to bearable compile time (2m20s
on my box with -g -O2).  With -g0 -O2 it compiles in 45s, so var-tracking still dominates the time, but it is no longer so slow.
I'll bootstrap/regtest it now.
Comment 14 Jakub Jelinek 2010-01-13 13:27:00 UTC
Subject: Bug 41371

Author: jakub
Date: Wed Jan 13 13:26:47 2010
New Revision: 155858

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=155858
Log:
	PR debug/41371
	* var-tracking.c (values_to_unmark): New variable.
	(find_loc_in_1pdv): Clear VALUE_RECURSED_INTO of values in
	values_to_unmark vector.  Moved body to...
	(find_loc_in_1pdv_1): ... this.  Don't clear VALUE_RECURSED_INTO,
	instead queue it into values_to_unmark vector.
	(vt_find_locations): Free values_to_unmark vector.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/var-tracking.c

Comment 15 Jakub Jelinek 2010-01-19 09:03:19 UTC
This patch seems to fix issues with (most of) KDE testcases, but there are still some testcases where gcc spends very long time in var-tracking (usually generated huge routines where simply too many VALUEs are tracked through).
E.g. https://bugzilla.redhat.com/show_bug.cgi?id=548826 and
https://bugzilla.redhat.com/show_bug.cgi?id=531218
These can be worked around by the #c9 patch, even with much bigger default (50mil).

Not sure if we should track this all in this bug, or just split into separate bugs.
Comment 16 Lubos Lunak 2010-02-16 12:47:27 UTC
Created attachment 19887 [details]
testcase from kdesdk/umbrello

If it helps, here's another testcase where 2G RAM is not enough. This is "g++ (SUSE Linux) 4.5.0 20100212 (experimental) [trunk revision 156733]" on x86_64 with -g -O2.
Comment 17 Jakub Jelinek 2010-02-16 13:06:55 UTC
You should retry with r156794 or newer.
Comment 18 Alexandre Oliva 2010-03-06 20:26:26 UTC
Subject: Bug 41371

Author: aoliva
Date: Sat Mar  6 20:26:15 2010
New Revision: 157257

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157257
Log:
* var-tracking.c (dataflow_set_merge): Swap src and src2.
Reverted:
2010-01-13  Jakub Jelinek  <jakub@redhat.com>
PR debug/41371
* var-tracking.c (values_to_unmark): New variable.
(find_loc_in_1pdv): Clear VALUE_RECURSED_INTO of values in
values_to_unmark vector.  Moved body to...
(find_loc_in_1pdv_1): ... this.  Don't clear VALUE_RECURSED_INTO,
instead queue it into values_to_unmark vector.
(vt_find_locations): Free values_to_unmark vector.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/var-tracking.c

Comment 19 Jakub Jelinek 2010-03-15 15:12:23 UTC
A variant of the #c9 patch is checked in for many days, do you still have something where VTA is really so big compile time or memory hog (besides PR43058)?
Comment 20 Richard Biener 2010-03-15 15:20:26 UTC
*** Bug 42911 has been marked as a duplicate of this bug. ***
Comment 21 Jakub Jelinek 2010-03-18 20:36:38 UTC
Assuming this is fixed.  If you have a new testcase, please file a new PR.
Comment 22 David Binderman 2010-05-16 10:21:09 UTC
(In reply to comment #21)
> Assuming this is fixed.  If you have a new testcase, please file a new PR.

I am not sure that my original bug report is fixed.

I tried 4.6 snapshot 20100515 and it couldn't compile it in ten minutes
(ulimit -t 600) on a 2.7GHz Athlon.


Comment 23 Richard Biener 2010-05-16 10:49:28 UTC
I see

 variable tracking     : 734.06 (99%) usr   0.33 (35%) sys 735.29 (99%) wall   11548 kB ( 8%) ggc
...

743.18user 1.00system 12:25.12elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+10440outputs (0major+116553minor)pagefaults 0swaps

for bug90.cc at -O2 -g with 4.5 branch r158342.

Which is a 6800% regression over 4.4.3:

10.88user 0.62system 0:12.64elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k
18664inputs+5096outputs (106major+55648minor)pagefaults 0swaps
Comment 24 Pawel Sikora 2010-05-16 12:22:50 UTC
*** Bug 43776 has been marked as a duplicate of this bug. ***
Comment 25 Pawel Sikora 2010-05-16 12:26:24 UTC
PR43776 constains another thestcase:

results for top of 4.5/4.6:

$ time g++45 -Wall -c 1.ii -O1 -g2
1.ii: In member function 'void es::ClockAnalyzer::initPrimitives()':
1.ii:38722:7: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without
g++45 -Wall -c 1.ii -O1 -g2  32,72s user 0,35s system 99% cpu 33,096 total

$ time g++46 -Wall -c 1.ii -O1 -g2
1.ii: In member function 'void es::ClockAnalyzer::initPrimitives()':
1.ii:38722:7: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without
g++46 -Wall -c 1.ii -O1 -g2  23,68s user 0,40s system 99% cpu 24,099 total

btw. the '-O1 -fno-inline -g2' has worse results.
Comment 26 Richard Biener 2010-05-17 14:28:20 UTC
The problem from comment #12 seems to be back.
Comment 27 Jakub Jelinek 2010-05-18 09:36:05 UTC
Subject: Bug 41371

Author: jakub
Date: Tue May 18 09:35:52 2010
New Revision: 159528

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=159528
Log:
	PR debug/41371
	* var-tracking.c (find_loc_in_1pdv): Add a few checks from
	rtx_equal_p inline.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/var-tracking.c

Comment 28 Jakub Jelinek 2010-05-25 10:39:53 UTC
Created attachment 20741 [details]
gcc46-pr41371.patch

Another small optimization.  At least on this testcase in 80% the s1node and s2var->var_part[0].loc_chain chains contain the same locations in the same order.  So, if we avoid calling find_loc_in_1pdv in that case and only start calling it when they differ, the testcase can be speeded up slightly.
With --enable-checking=release cc1plus the difference is:
real    4m52.484s
user    4m51.991s
sys     0m0.446s

to:

real    3m38.218s
user    3m37.641s
sys     0m0.383s

I'm going to bootstrap/regtest it now with additional statistics gathering on how many find_loc_in_1pdv calls it can avoid.
Comment 29 Jakub Jelinek 2010-05-25 16:27:25 UTC
Subject: Bug 41371

Author: jakub
Date: Tue May 25 16:27:03 2010
New Revision: 159829

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=159829
Log:
	PR debug/41371
	* var-tracking.c (find_loc_in_1pdv): Guard asserts with
	ENABLE_CHECKING.
	(intersect_loc_chains): Walk the s2var's loc_chain together
	with s1node chain as long as the locations are equal, don't
	call find_loc_in_1pdv in that case.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/var-tracking.c

Comment 30 Jakub Jelinek 2010-06-04 09:02:56 UTC
Created attachment 20833 [details]
rh598310.i.bz2

Another testcase from wine,
./cc1 -m32 -fPIC -fno-strict-aliasing -g -O2 rh598310.i -quiet
takes here over 7 minutes, again most of the time spent in find_loc_in_1pdv called from intersect_loc_chains.
Comment 31 Jakub Jelinek 2010-06-04 09:24:49 UTC
Created attachment 20834 [details]
limit-depth.patch

Quick patch that brings the time down to 1 minute 15 sec.

From the callgrind dump on this testcase, intersect_loc_chains in 5159260x
cases calls insert_into_intersection in the first loop (i.e. where the recent patch from this PR helps), then calls 320611x find_loc_in_1pdv.  Out of these
calls, 75537x it returns NULL and the rest of time it returns non-NULL.  When
it returns non-NULL, it is possible to return non-NULL without recursion
244863x, with one level of find_loc_in_1pdv recursion 205x and with two level recursion 6x.  No successful call in this testcase needs deeper recursion.
When not limiting the recursion depth, instead just monitoring it, for all cases where find_loc_in_1pdv returned non-NULL the depth at which return node; is used
is at most 2 (0 is just find_loc_in_1pdv call with no recursion) and maximum depth ever recursed for the successful calls is 3.  I believe the problem of this testcase is not the cases where find_loc_in_1pdv returns non-NULL.
Comment 32 Jakub Jelinek 2010-06-04 09:30:26 UTC
Created attachment 20835 [details]
hack

Hack that shows that the maximum depth is 3 even for the found == NULL cases.
Alex tells me on IRC he has the right fix for this.
Comment 33 Alexandre Oliva 2010-06-04 09:51:02 UTC
Mine
Comment 34 Alexandre Oliva 2010-06-04 10:03:13 UTC
Created attachment 20836 [details]
Patch that fixes the bug and verifies recursion is bounded as expected

This patch (except for comments and ChangeLog) completed stage1 libs, more testing needed.
Comment 35 Richard Biener 2010-06-04 11:03:31 UTC
Created attachment 20837 [details]
bnc611650

Another testcase from open-office this time, on i?86-linux only.

$> time  g++ -c -g -Os -fomit-frame-pointer analysis.ii

real    103m46.642s
user    96m21.093s
sys     0m1.196s

$> time g++ -c -g -Os analysis.ii

real    0m19.345s
user    0m19.177s
sys     0m0.168s

$> time  g++ -c -Os -fomit-frame-pointer analysis.ii

real    0m3.747s
user    0m3.640s
sys     0m0.104s

numbers above are from the 4.5 branch rev. 159866.  trunk is also very slow,
but the compile didn't yet finish.
Comment 36 Jakub Jelinek 2010-06-04 11:15:45 UTC
With Alex' patch with checking guarded with #ifdef ENABLE_CHECKING (currently bootstrapping/regtesting that) I see on x86_64 on the trunk:
time ./cc1plus -m32 -quiet -g -Os -fomit-frame-pointer bug-611650_analysis.ii 

real    2m16.789s
user    2m16.333s
sys     0m0.276s

Of course not ideal, but much better than those 103 minutes.
Comment 37 Richard Biener 2010-06-04 12:45:07 UTC
Subject: Bug 41371

Author: rguenth
Date: Fri Jun  4 12:44:41 2010
New Revision: 160261

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=160261
Log:
2010-06-04  Richard Guenther  <rguenther@suse.de>

	Backport from mainline:
	2010-05-18  Jakub Jelinek  <jakub@redhat.com>

	PR debug/41371
	* var-tracking.c (find_loc_in_1pdv): Add a few checks from
	rtx_equal_p inline.

	2010-05-25  Jakub Jelinek  <jakub@redhat.com>
 
	PR debug/41371
	* var-tracking.c (find_loc_in_1pdv): Guard asserts with
	ENABLE_CHECKING.
	(intersect_loc_chains): Walk the s2var's loc_chain together
	with s1node chain as long as the locations are equal, don't
	call find_loc_in_1pdv in that case.

Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/var-tracking.c

Comment 38 Jakub Jelinek 2010-06-04 16:42:57 UTC
Subject: Bug 41371

Author: jakub
Date: Fri Jun  4 16:42:27 2010
New Revision: 160280

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=160280
Log:
	PR debug/41371
	* var-tracking.c (find_loc_in_1pdv): Mark initial value before
	recursing.  Check that recursion is bounded.  Rename inner var
	to avoid hiding incoming argument.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/var-tracking.c

Comment 39 Jakub Jelinek 2010-06-04 16:48:02 UTC
Subject: Bug 41371

Author: jakub
Date: Fri Jun  4 16:47:41 2010
New Revision: 160282

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=160282
Log:
	PR debug/41371
	* var-tracking.c (find_loc_in_1pdv): Mark initial value before
	recursing.  Check that recursion is bounded.  Rename inner var
	to avoid hiding incoming argument.

Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/var-tracking.c

Comment 40 Alexandre Oliva 2010-06-09 04:56:15 UTC
Fixed
Comment 41 Alexandre Oliva 2010-06-10 16:44:08 UTC
Subject: Bug 41371

Author: aoliva
Date: Thu Jun 10 16:43:46 2010
New Revision: 160559

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=160559
Log:
PR debug/41371
* var-tracking.c (find_loc_in_1pdv): Remove recursion, only
tail-recurse into canonical node.  Fast-forward over
non-canonical VALUEs.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/var-tracking.c

Comment 42 Alexandre Oliva 2010-06-10 17:01:59 UTC
Subject: Bug 41371

Author: aoliva
Date: Thu Jun 10 17:01:32 2010
New Revision: 160563

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=160563
Log:
PR debug/41371
* var-tracking.c (find_loc_in_1pdv): Remove recursion, only
tail-recurse into canonical node.  Fast-forward over
non-canonical VALUEs.

Modified:
    branches/gcc-4_5-branch/gcc/ChangeLog
    branches/gcc-4_5-branch/gcc/var-tracking.c