This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: PCH, and more generally C++ parser performance
On Thu, Aug 24, 2000 at 06:40:47PM -0700, Mark Mitchell wrote:
> >>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:
>
> Zack> It's been called 285,000 times. It called make_node 90,000
> Zack> times. That means that there were 90,000 unique identifiers
> Zack> in the sample code. get_identifier's hash table has only
> Zack> 1000 slots; even with a perfect hash function, which it does
> Zack> not have, we would get 90-element chains. (An ideal binary
> Zack> search tree would have depth 16.)
>
> Funny -- Alex, Richard, Jason, Benjamin and I we were just talking
> about this at lunch yesterday. I was planning on doing something
> about this -- but it's great if you will instead.
Okay. I have a quick-and-dirty implementation of this (which doesn't
work except in front ends that use cpplib ... minor details, y'know)
and it is a pretty impressive win.
Before:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
17.23 2.75 2.75 285012 0.01 0.01 get_identifier
11.47 4.58 1.83 14360005 0.00 0.00 ggc_set_mark
8.27 5.90 1.32 18 73.33 208.88 ggc_mark_trees
4.57 6.63 0.73 5652 0.13 0.13 lookup_tag
3.57 7.20 0.57 1 570.00 15916.08 yyparse
2.94 7.67 0.47 2011020 0.00 0.00 lang_mark_tree
2.76 8.11 0.44 1404252 0.00 0.00 memset
2.19 8.46 0.35 72526 0.00 0.03 grokdeclarator
1.69 8.73 0.27 1861769 0.00 0.00 ggc_alloc
1.50 8.97 0.24 201127 0.00 0.00 walk_tree
...
0.81 10.60 0.13 179920 0.00 0.00 _cpp_lookup_with_hash
After:
13.67 1.90 1.90 13535029 0.00 0.00 ggc_set_mark
8.71 3.11 1.21 18 67.22 197.11 ggc_mark_trees
5.47 3.87 0.76 1 760.00 13729.41 yyparse
4.25 4.46 0.59 5652 0.10 0.10 lookup_tag
3.45 4.94 0.48 2004597 0.00 0.00 lang_mark_tree
3.45 5.42 0.48 1399863 0.00 0.00 memset
2.52 5.77 0.35 1771787 0.00 0.00 ggc_alloc
2.52 6.12 0.35 72526 0.00 0.02 grokdeclarator
2.23 6.43 0.31 1222221 0.00 0.00 make_node
2.09 6.72 0.29 286353 0.00 0.00 _cpp_lookup_with_hash
Total time as measured by gprof is 15.96sec before, 13.89sec after. I
think we can do even better; we're presently calling
_cpp_lookup_with_hash more than we need to.
0.11 0.03 106475/286353 cpp_lookup [102]
0.18 0.04 179878/286353 parse_name [58]
[57] 2.6 0.29 0.07 286353 _cpp_lookup_with_hash [57]
The calls from parse_name are unavoidable, but I think we can crush
out many of the calls from cpp_lookup. If we trace back a bit more...
0.00 0.00 19/106475 initialize_builtins [578]
0.00 0.00 23/106475 _cpp_init_stacks [700]
0.00 0.00 934/106475 maybe_get_identifier [462]
0.04 0.13 105499/106475 get_identifier [88]
[102] 1.3 0.04 0.13 106475 cpp_lookup [102]
0.11 0.03 106475/286353 _cpp_lookup_with_hash [57]
0.00 0.05 23586/105499 set_mangled_name_for_decl [65]
0.00 0.05 24312/105499 declare_function_name [13]
0.00 0.10 47875/105499 make_decl_rtl [41]
[88] 1.5 0.00 0.21 105499 get_identifier [88]
0.04 0.13 105499/106475 cpp_lookup [102]
I don't follow make_decl_rtl real well, but it looks like we're
calling get_identifier from there to set the DECL_ASSEMBLER_NAME of a
decl - and that name often winds up being the same as the DECL_NAME,
but we go through get_identifier anyway. I Could Be Wrong.
The calls from declare_function_name are completely avoidable; we are
looking up __FUNCTION__, etc. over and over again. Shove 'em into
c_global_trees and forget about it...
And set_mangled_name_for_decl ... does not call get_identifier. This
is an artifact of tailcall optimization. It calls either mangle_decl
or build_decl_overload_real (depending on -fnew-abi), both of which
end by tailcalling get_identifier. This is to set DECL_ASSEMBLER_NAME
and looks unavoidable - at least at first glance.
zw