This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
Re: Example program takes 2000 times as long to compile under C++ as C
- To: Kelley Cook <kelleycook at attglobal dot net>
- Subject: Re: Example program takes 2000 times as long to compile under C++ as C
- From: Zack Weinberg <zack at wolery dot cumb dot org>
- Date: Wed, 30 Aug 2000 16:29:23 -0700
- Cc: gcc-bugs at gcc dot gnu dot org
- References: <000901c012cd$19fde840$5b5f6294@kelley>
On Wed, Aug 30, 2000 at 05:56:05PM -0400, Kelley Cook wrote:
> As originally brought up in message
> http://gcc.gnu.org/ml/gcc-bugs/2000-06/msg00385.html, Mr. Robert Melcher
> <robert.melchers@drives.eurotherm.co.uk> discovered that the following
> program compiles nearly instantly in C, but takes 5 minutes with C++.
>
> Mike Stump suggested to profile the compiler, unfortunately this appeared
> hard to do, so I didn't bother.
It happens that I have a profiled cc1plus binary right here. When I
run it on your test case, I get:
Each sample counts as 0.01 seconds.
% cum self self total
time secs secs calls ms/call ms/call name
67.82 58.49 58.49 50 1169.80 1723.80 walk_tree
10.99 67.97 9.48 193710255 0.00 0.00 simplify_aggr_init_exprs_r
9.08 75.80 7.83 258280343 0.00 0.00 calls_setjmp_r
4.80 79.94 4.14 286978156 0.00 0.00 statement_code_p
4.29 83.64 3.70 143489643 0.00 0.00 first_rtl_op
2.40 85.71 2.07 21523360 0.00 0.00 special_function_p
0.56 86.19 0.48 21523360 0.00 0.00 setjmp_call_p
Yow! The immediate question is who called walk_tree, and the answer
is we don't know for sure, because it's being called in a cycle.
[4] 99.9 58.49 27.70 34+83 <cycle 2 as a whole> [4]
58.49 27.70 50+495037350 walk_tree <cycle 2> [5]
0.00 0.00 30 for_each_template_parm <cycle 2> [708]
-----------------------------------------------
495037350 walk_tree <cycle 2> [5]
30 for_each_template_parm <cycle 2> [708]
1.72 0.81 1/34 expand_body [22]
1.72 0.81 1/34 calls_setjmp_p [32]
30.97 14.66 18/34 no_linkage_check [9]
[5] 99.9 58.49 27.70 50+495037350 walk_tree <cycle 2> [5]
7.83 2.55 258280343/258280343 calls_setjmp_r [20]
9.48 0.00 193710255/193710255 simplify_aggr_init_exprs_r [21]
4.14 0.00 286978156/286978156 statement_code_p [25]
3.70 0.00 143489078/143489643 first_rtl_op [26]
0.00 0.00 23/23 no_linkage_helper [722]
37 for_each_template_parm_r <cycle 2> [666]
495037350 walk_tree <cycle 2> [5]
So I ran it under the debugger until it got stuck, hit ^C, and the
lucky winner is... calls_setjmp_p! But it's only called once, I hear
you cry? Yah - but it's called to walk over the saved expression tree
for the entire function, and that expression tree... oy, what a mess.
<modify_expr type <integer_type unsigned int>
side-effects
arg 0 <var_decl output>
arg 1 <nop_expr type <integer_type unsigned int>
side-effects
arg 0 <save_expr type <integer_type int>
side-effects
arg 0 <compound_expr type <integer_type int>
side-effects
arg 0 <convert_expr type <void_type void>
side-effects
arg 0 <save_expr type <integer_type int>
side-effects unsigned
arg 0 <compound_expr>
arg 1 <function_decl mux>
rtl 2 (nil)>>
arg 1 <cond_expr type <integer_type int>
side-effects arg 0 <var_decl in15>
arg 1 <bit_ior_expr type <integer_type int>
side-effects
arg 0 <save_expr>
arg 1 <integer_cst 32768>>
arg 2 <non_lvalue_expr type <integer_type int>
side-effects
arg 0 <save_expr>>>>
arg 1 <function_decl mux>
rtl 2 (nil)>>>
That's an overview. It goes on, and on, and on, at least 60 levels
deep. Look back up at the summary: calls_setjmp_r was called
258,280,343 times. That means the tree walker visited 258 million
tree nodes. (The calls to simplify_aggr
How can that possibly be when GC claims there's only a few thousand
tree nodes in existence? Because there's heavy repetition going on.
Every node in that tree has a type, each time we hit a node we visit
its type. And types are fairly complex beasts. Likewise, we
encounter the VAR_DECLs and FUNCTION_DECLs over and over again. I
suspect there's some shared structure hiding in the SAVE_EXPR nodes,
too.
I instructed gdb to print out the address of every node visited by
calls_setjmp_r, let it grind for awhile, and ran the output through
"sort | uniq -c". There are a small number (ten or so) of nodes that
are visited more than 20,000 times. There are only about 200 nodes
examined overall.
This is not trivial to fix. calls_setjmp_p could be replaced by
noticing during parsing that we're calling setjmp and setting a flag
(kinda like the way calls.c does it at the RTL level, now). But
expand_body does much the same thing with simplify_aggr_init_exprs_r.
And in the current tree (this compiler is a wee out of date)
finish_stmt_tree does it with prune_unused_decls...
I think what we want is two variants of walk_tree. One for things
like no_linkage_check, that really do need to visit each node; one for
things like calls_setjmp_p and the new prune_unused_decls, that only
visits statement nodes. I'm not sure what to do with
simplify_aggr_init_exprs_r.
Or I s'pose we could find a way to notice when we'd already visited a
node and not examine it again. There *are* two unused flag bits in
struct tree_common still...
Comments from the whole-function-mode crowd?
zw