Bug 52640 - [4.8 Regression] performance bottleneck: gcc/tree.c;value_member
Summary: [4.8 Regression] performance bottleneck: gcc/tree.c;value_member
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.6.2
: P1 normal
Target Milestone: 4.8.0
Assignee: Steven Bosscher
URL: http://gcc.gnu.org/ml/gcc-patches/201...
Keywords: compile-time-hog, patch
Depends on:
Blocks:
 
Reported: 2012-03-20 18:20 UTC by ncahill_alt
Modified: 2012-12-12 09:44 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 4.5.5, 4.6.4, 4.7.1
Known to fail: 4.8.0
Last reconfirmed: 2012-03-21 00:00:00


Attachments
source, output, profiling data indicating problem (167.88 KB, application/gzip)
2012-03-20 18:20 UTC, ncahill_alt
Details
Needs an ASM_OUTPUT_EXTERNAL platform for testing (1017 bytes, patch)
2012-03-21 13:01 UTC, Steven Bosscher
Details | Diff
Fixed patch (1009 bytes, patch)
2012-03-21 16:17 UTC, Steven Bosscher
Details | Diff
gcc -version && output with -v added (1.09 KB, application/gzip)
2012-03-21 17:47 UTC, ncahill_alt
Details
gcc48-pr52640.patch (1.30 KB, patch)
2012-12-11 17:05 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description ncahill_alt 2012-03-20 18:20:10 UTC
Created attachment 26935 [details]
source, output, profiling data indicating problem

It was made known to me that the code provided takes a long time to compile.  Looking into it, the culprit seems to be the value_member function of gcc/tree.c.  Here is that code (preprocessed):

####
tree value_member (tree elem, tree list)
{
	while (list) 
	{
		if (elem == ((list)->list.value))
			return list;
		list = ((list)->common.chain);
	}
	return (tree) ((void *)0);
}
####

A sample of profiling data, from OProfile and using gcc 4.6.2:

####
samples  %        image name               symbol name
19955    43.1814  cc1                      value_member
1179      2.5513  cc1                      record_reference
1122      2.4279  cc1                      insert_aux

####

With a table of 10000 rows, it takes ~40% of the execution time, and with 200000 rows, ~90%.  The former takes 5 seconds, the later is not finished in 20 minutes.

Provided is a sample source file with output from cc1 and profiling data as above.

Thank you.
Neil.
Comment 1 Jan Smets 2012-03-20 23:45:27 UTC
I'd like to note that this code compiles in a few seconds in GCC 3.4.
Thanks Neil for making this profile.
Comment 2 Steven Bosscher 2012-03-21 00:37:18 UTC
I'll have a stab at this.
Comment 3 Steven Bosscher 2012-03-21 13:00:33 UTC
Can you please add the output of "gcc --version" and add "-v" to your compiler invocation and report the output?
Comment 4 Steven Bosscher 2012-03-21 13:01:07 UTC
Created attachment 26941 [details]
Needs an ASM_OUTPUT_EXTERNAL platform for testing
Comment 5 Steven Bosscher 2012-03-21 13:03:57 UTC
Ah, and obviously there should be a pointer_set_insert before VEC_safe_push...
Comment 6 Jan Smets 2012-03-21 15:51:22 UTC
Comment on attachment 26941 [details]
Needs an ASM_OUTPUT_EXTERNAL platform for testing

Where does  deferred_global_decls come from? Can this be modified to apply to 4.6.3 ?

Target is i686-wrs-vxworks / mips-wrs-vxworks
Comment 7 Steven Bosscher 2012-03-21 16:17:24 UTC
Created attachment 26942 [details]
Fixed patch

Previous patch was obviously not yet tested, this one at least compiles.
Comment 8 ncahill_alt 2012-03-21 17:47:21 UTC
Created attachment 26945 [details]
gcc -version && output with -v added
Comment 9 Steven Bosscher 2012-03-21 18:35:10 UTC
Regression w.r.t. GCC3 on all active release branches.
Comment 10 Jan Smets 2012-03-21 19:46:31 UTC
Works. ~13 sec for 243k entries. Still slower than GCC 3.4 but at least better than 80+ seconds.
Comment 11 Steven Bosscher 2012-03-24 13:46:42 UTC
Author: steven
Date: Sat Mar 24 13:46:33 2012
New Revision: 185757

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185757
Log:
gcc/
	PR middle-end/52640
	* varasm.c: Include pointer-set.h.
	(pending_assemble_externals_set): New pointer set.
	(process_pending_assemble_externals): Destroy the pointer set.
	(assemble_external): See if decl is in pending_assemble_externals_set,
	and add it to pending_assemble_externals if necessary.
	(init_varasm_once): Allocate pending_assemble_externals_set.

testsuite/
	PR middle-end/52640
	* gcc.c-torture/compile/limits-externdecl.c: New test.

Added:
    branches/gcc-4_5-branch/gcc/testsuite/gcc.c-torture/compile/limits-externdecl.c
Modified:
    branches/gcc-4_5-branch/gcc/varasm.c
Comment 12 Steven Bosscher 2012-03-24 13:47:51 UTC
Author: steven
Date: Sat Mar 24 13:47:46 2012
New Revision: 185758

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185758
Log:
gcc/
	PR middle-end/52640
	* varasm.c: Include pointer-set.h.
	(pending_assemble_externals_set): New pointer set.
	(process_pending_assemble_externals): Destroy the pointer set.
	(assemble_external): See if decl is in pending_assemble_externals_set,
	and add it to pending_assemble_externals if necessary.
	(init_varasm_once): Allocate pending_assemble_externals_set.

testsuite/
	PR middle-end/52640
	* gcc.c-torture/compile/limits-externdecl.c: New test.

Added:
    branches/gcc-4_6-branch/gcc/testsuite/gcc.c-torture/compile/limits-externdecl.c
Modified:
    branches/gcc-4_6-branch/gcc/varasm.c
Comment 13 Steven Bosscher 2012-03-24 13:48:51 UTC
Author: steven
Date: Sat Mar 24 13:48:35 2012
New Revision: 185759

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=185759
Log:
gcc/
	PR middle-end/52640
	* varasm.c: Include pointer-set.h.
	(pending_assemble_externals_set): New pointer set.
	(process_pending_assemble_externals): Destroy the pointer set.
	(assemble_external): See if decl is in pending_assemble_externals_set,
	and add it to pending_assemble_externals if necessary.
	(init_varasm_once): Allocate pending_assemble_externals_set.

testsuite/
	PR middle-end/52640
	* gcc.c-torture/compile/limits-externdecl.c: New test.

Added:
    branches/gcc-4_7-branch/gcc/testsuite/gcc.c-torture/compile/limits-externdecl.c
Modified:
    branches/gcc-4_7-branch/gcc/varasm.c
Comment 14 Steven Bosscher 2012-03-24 13:52:21 UTC
Fixed on release branches.
Comment 15 Richard Biener 2012-09-07 11:10:57 UTC
Any progress on the "real" solution?  If not, can you install the branch fix on
trunk?  Thx.
Comment 16 stevenb.gcc@gmail.com 2012-09-07 19:07:27 UTC
> Any progress on the "real" solution?  If not, can you install the branch fix on
> trunk?  Thx.

I think I already mentioned before that the branch fix is the only
real fix I can think of. The problem has been aggravated by the
visibility machinery, it depends on having a pending-externs list.
Comment 17 Jakub Jelinek 2012-12-11 08:03:44 UTC
Any progress on this?  Does the branch patch not work on the trunk?
Comment 18 Richard Biener 2012-12-11 09:59:13 UTC
(In reply to comment #17)
> Any progress on this?  Does the branch patch not work on the trunk?

Simply task of forward-porting and testing.
Comment 19 Jan Smets 2012-12-11 16:55:30 UTC
Steven's example only contains 10 000 entries. I need to recompile a symbol table of 270.000 entries a dozen times each day. (and so do a lot of other people)
With the 'fix' it still takes 12+ seconds to compile it on a state of the art machine and then another 4+ seconds to assemble. CLANG2.8 spends 15.5 sec and CLANG3.1 16.5 sec.

I'm already happy with the work around but I'd really like to see a more structural solution, whichever that may be.  Thanks
Comment 20 Jakub Jelinek 2012-12-11 17:05:33 UTC
Created attachment 28927 [details]
gcc48-pr52640.patch

Patch I'm going to bootstrap/regtest on the trunk.  No point to #include pointer-set.h when it is already included (it is included twice in 4.7 varams.c), and the var should be IMHO defined only if ASM_OUTPUT_EXTERNAL is defined, otherwise it is unused local var.
Comment 21 Jakub Jelinek 2012-12-12 09:43:39 UTC
Author: jakub
Date: Wed Dec 12 09:43:33 2012
New Revision: 194441

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=194441
Log:
	PR middle-end/52640
	* varasm.c (pending_assemble_externals_set): New pointer set.
	(process_pending_assemble_externals): Destroy the pointer set.
	(assemble_external): See if decl is in pending_assemble_externals_set,
	and add it to pending_assemble_externals if necessary.
	(init_varasm_once): Allocate pending_assemble_externals_set.

	* gcc.c-torture/compile/limits-externdecl.c: New test.

Added:
    trunk/gcc/testsuite/gcc.c-torture/compile/limits-externdecl.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/varasm.c
Comment 22 Jakub Jelinek 2012-12-12 09:44:42 UTC
Fixed.