GCC 3.2.1 -> GCC 3.3 compile speed regression
Ziemowit Laski
zlaski@apple.com
Fri Jan 31 07:49:00 GMT 2003
I will try to make this as succinct as possible. For testing
I used the gcc_3_2_1_release and the gcc-3_3-branch; for the
test case, a preprocessed version of a single source file
found in the open source version of the Qt toolkit. You will
find the two test cases (created under RedHat Linux 8.0) here:
http://homepage.mac.com/zlaski/FileSharing1.html
I've instrumented my local copy of 3.3 to produce -fmem-report
stats comparable to 3.2.1 (productizing this for FSF use is another
matter that I will need to coordinate with Geoff). Also, I've augmented
-fmem-report in both compilers to also tell us how many times
gcc_collect()
is called, and how many times it actually decides to go through with its
mark-n-sweep thang.
Both compilers are using the default 4MB heap size. I re-ran both
command lines several times to warm up the various caches, and then
reported the last 3 timings for each. The -fmem-report and -Q output
has been trimmed to only show the juicy cuts:
/home/zlaski/fsf/obj/gcc/gcc_3_2_1_release/gcc/g++ -B
/home/zlaski/fsf/obj/gcc/gcc_3_2_1_release/gcc/ -c
structureparser.gcc_3_2_1_release.ii -fmem-report -Q
{GC 5327k -> 3237k}
{GC 5328k -> 4109k}
{GC 5343k -> 4661k}
{GC 6065k -> 5406k}
{GC 7029k -> 6218k}
{GC 8120k -> 7203k}
{GC 9386k -> 8260k}
{GC 10741k -> 9470k}
{GC 12395k -> 10397k}
{GC 13584k -> 10727k}
{GC 14039k -> 11681k}
{GC 13922k -> 12090k}
Tree Number Bytes % Total
Total 216513 10M
RTX Number Bytes % Total
Total 1165 12k
Collector calls: 7644
Collections performed: 12
Size Allocated Used Overhead
Total 13M 11M 130k
real 0m2.692s
real 0m2.682s
real 0m2.686s
/home/zlaski/fsf/obj/gcc/gcc-3_3-branch/gcc/g++ -B
/home/zlaski/fsf/obj/gcc/gcc-3_3-branch/gcc/ -c
structureparser.gcc-3_3-branch.ii -fmem-report -Q
{GC 5326k -> 2058k}
{GC 5325k -> 3023k}
{GC 5411k -> 3871k}
{GC 5420k -> 4278k}
{GC 5564k -> 4414k}
{GC 5740k -> 4772k}
{GC 6209k -> 5170k}
{GC 6745k -> 5728k}
{GC 7447k -> 6243k}
{GC 8119k -> 6722k}
{GC 8740k -> 7364k}
{GC 9584k -> 8121k}
{GC 10596k -> 8662k}
{GC 11279k -> 9516k}
{GC 12400k -> 10220k}
{GC 13352k -> 10597k}
{GC 13844k -> 10762k}
{GC 14069k -> 10903k}
{GC 14183k -> 11670k}
{GC 15455k -> 12306k}
{GC 4194303k -> 12411k} /* ignore the 4Gb - needed to trigger the last
collection :-) */
Tree Number Bytes % Total
Total 227136 11M
RTX Number Bytes % Total
Total 1322 12k
Collector calls: 7805
Collections performed: 21
Size Allocated Used Overhead
Total 14M 12M 133k
real 0m3.649s
real 0m3.648s
real 0m3.641s
Immediately, a culprit comes to mind: The 3.3 compiler performs almost
TWICE AS MANY COLLECTIONS as the 3.2.1 compiler, with a roughly
comparable # of gcc_collect() calls. This can't be cheap.
What we see from the GC numbers is that the 3.3 collections are able to
reclaim a greater percentage of allocated memory at each turn. This
could be a combination of two factors:
- Data in the 3.3 compiler have better temporal locality (!!)
- The 3.2.1 collector was buggy/incomplete and could not reliably
reclaim some stuff.
Anyway, this better reclamation done by the 3.3 ggc also explains why
the collections occur more frequently there.
So, I throttled the heap size all the way to 128 MB. For the 3.2.1
compiler, this involved rebuilding it after changing ggc-page.c; the
3.3 compiler is more civilized and accepts a --param:
/home/zlaski/fsf/obj/gcc/gcc_3_2_1_release/gcc/g++ -B
/home/zlaski/fsf/obj/gcc/gcc_3_2_1_release/gcc/ -c
structureparser.gcc_3_2_1_release.ii -fmem-report
Execution times (seconds)
garbage collection : 0.11 ( 4%) usr 0.00 ( 0%) sys 0.11 ( 4%)
wall
cfg construction : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
life analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
preprocessing : 0.19 ( 8%) usr 0.13 (27%) sys 0.28 ( 9%)
wall
lexical analysis : 0.28 (11%) usr 0.21 (43%) sys 0.44 (14%)
wall
parser : 1.73 (71%) usr 0.15 (31%) sys 1.98 (65%)
wall
expand : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 2%)
wall
varconst : 0.03 ( 1%) usr 0.00 ( 0%) sys 0.05 ( 2%)
wall
integration : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
flow analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 1%)
wall
mode switching : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 1%)
wall
local alloc : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 1%)
wall
global alloc : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 1%)
wall
shorten branches : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
rest of compilation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 1%)
wall
TOTAL : 2.45 0.49 3.05
Tree Number Bytes % Total
Total 216390 10M
RTX Number Bytes % Total
Total 1165 12k
Collector calls: 7644
Collections performed: 1
Size Allocated Used Overhead
Total 23M 11M 242k
real 0m1.997s
real 0m1.999s
real 0m1.994s
/home/zlaski/fsf/obj/gcc/gcc-3_3-branch/gcc/g++ -B
/home/zlaski/fsf/obj/gcc/gcc-3_3-branch/gcc/ -c
structureparser.gcc-3_3-branch.ii -fmem-report --param
ggc-min-heapsize=131072
Execution times (seconds)
garbage collection : 0.10 ( 3%) usr 0.00 ( 0%) sys 0.09 ( 3%)
wall
life info update : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
preprocessing : 0.27 ( 9%) usr 0.13 (22%) sys 0.27 ( 7%)
wall
lexical analysis : 0.29 (10%) usr 0.15 (26%) sys 0.72 (20%)
wall
parser : 2.13 (72%) usr 0.28 (48%) sys 2.41 (66%)
wall
expand : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
varconst : 0.06 ( 2%) usr 0.00 ( 0%) sys 0.06 ( 2%)
wall
jump : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
global alloc : 0.02 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
reg stack : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
final : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 1%)
wall
symout : 0.00 ( 0%) usr 0.01 ( 2%) sys 0.02 ( 0%)
wall
rest of compilation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%)
wall
TOTAL : 2.94 0.58 3.67
Tree Number Bytes % Total
Total 227014 11M
RTX Number Bytes % Total
Total 1322 12k
Collector calls: 7805
Collections performed: 1
Size Allocated Used Overhead
Total 27M 12M 274k
real 0m2.580s
real 0m2.585s
real 0m2.580s
The overall _amount_ of extra memory being allocated clearly does not
have an adverse effect on the time it takes the garbage collector to
traverse it, since the 3.3 collector runs at least as fast as the 3.2.1
collector in spite of about 10% more data under its wing.
However, the access _patterns_ for the allocated memory clearly do have
an effect on the _number_ of traversals the garbage collector will
perform on it. What you see from the numbers is that more heap data
becomes orphaned sooner in 3.3 than in 3.2.1, and is therefore
reclaimed. The less remaining (i.e., marked) data means a lower
threshold for firing off the next collection, and ergo we see more of
them.
But, this is not all; even though a lot more stuff is successfully
reclaimed, the live memory size grows back quickly, only to be
reclaimed with equal vigor. What this suggests (correct me if I'm
going astray) is a "sawtooth" time series of memory allocations. Each
tooth tends to group together data with a high degree of temporal
locality, most of it destroyed at the end of some activity (my guess:
compiling a body of a function). Then, as the compilation of the next
function begins, the compiler again allocates a bunch of data that it
will use for that function and that function only.
This leads me to the following half-baked hypothesis: Some kinds of
data/information that used to persist across the entire compilation are
now created and destroyed repeatedly in some local temporal scope, like
the compilation of a single function. This hypothesis would explain
not only the increased number of gc passes, but also the overall
performance degradation we see (even with gc turned off).
So, the question is, has anyone touched code in the C++ front-end that
deals with compilation of functions? Classes? Perhaps someone has
tried to reduce the memory footprint of the running compiler by
destroying all data unless it is absolutely needed? The law of
unintended consequences applies in engineering as well as in economic
policy. :-)
--Zem
--------------------------------------------------------------
Ziemowit Laski 1 Infinite Loop, MS 301-2K
Mac OS X Compiler Group Cupertino, CA USA 95014-2083
Apple Computer, Inc. +1.408.974.6229 Fax .5477
More information about the Gcc
mailing list