This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: PCH, and more generally C++ parser performance


On Thu, Aug 24, 2000 at 06:40:47PM -0700, Mark Mitchell wrote:
> >>>>> "Zack" == Zack Weinberg <zack@wolery.cumb.org> writes:
> 
>     Zack> It's been called 285,000 times.  It called make_node 90,000
>     Zack> times.  That means that there were 90,000 unique identifiers
>     Zack> in the sample code.  get_identifier's hash table has only
>     Zack> 1000 slots; even with a perfect hash function, which it does
>     Zack> not have, we would get 90-element chains.  (An ideal binary
>     Zack> search tree would have depth 16.)
> 
> Funny -- Alex, Richard, Jason, Benjamin and I we were just talking
> about this at lunch yesterday.  I was planning on doing something
> about this -- but it's great if you will instead.

Okay.  I have a quick-and-dirty implementation of this (which doesn't
work except in front ends that use cpplib ... minor details, y'know)
and it is a pretty impressive win.

Before:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 17.23      2.75     2.75   285012     0.01     0.01  get_identifier
 11.47      4.58     1.83 14360005     0.00     0.00  ggc_set_mark
  8.27      5.90     1.32       18    73.33   208.88  ggc_mark_trees
  4.57      6.63     0.73     5652     0.13     0.13  lookup_tag
  3.57      7.20     0.57        1   570.00 15916.08  yyparse
  2.94      7.67     0.47  2011020     0.00     0.00  lang_mark_tree
  2.76      8.11     0.44  1404252     0.00     0.00  memset
  2.19      8.46     0.35    72526     0.00     0.03  grokdeclarator
  1.69      8.73     0.27  1861769     0.00     0.00  ggc_alloc
  1.50      8.97     0.24   201127     0.00     0.00  walk_tree
...
  0.81     10.60     0.13   179920     0.00     0.00  _cpp_lookup_with_hash

After:

 13.67      1.90     1.90 13535029     0.00     0.00  ggc_set_mark
  8.71      3.11     1.21       18    67.22   197.11  ggc_mark_trees
  5.47      3.87     0.76        1   760.00 13729.41  yyparse
  4.25      4.46     0.59     5652     0.10     0.10  lookup_tag
  3.45      4.94     0.48  2004597     0.00     0.00  lang_mark_tree
  3.45      5.42     0.48  1399863     0.00     0.00  memset
  2.52      5.77     0.35  1771787     0.00     0.00  ggc_alloc
  2.52      6.12     0.35    72526     0.00     0.02  grokdeclarator
  2.23      6.43     0.31  1222221     0.00     0.00  make_node
  2.09      6.72     0.29   286353     0.00     0.00  _cpp_lookup_with_hash

Total time as measured by gprof is 15.96sec before, 13.89sec after.  I
think we can do even better; we're presently calling
_cpp_lookup_with_hash more than we need to.

                0.11    0.03  106475/286353      cpp_lookup [102]
                0.18    0.04  179878/286353      parse_name [58]
[57]     2.6    0.29    0.07  286353         _cpp_lookup_with_hash [57]

The calls from parse_name are unavoidable, but I think we can crush
out many of the calls from cpp_lookup.  If we trace back a bit more...

                0.00    0.00      19/106475      initialize_builtins [578]
                0.00    0.00      23/106475      _cpp_init_stacks [700]
                0.00    0.00     934/106475      maybe_get_identifier [462]
                0.04    0.13  105499/106475      get_identifier [88]
[102]    1.3    0.04    0.13  106475         cpp_lookup [102]
                0.11    0.03  106475/286353      _cpp_lookup_with_hash [57]

                0.00    0.05   23586/105499      set_mangled_name_for_decl [65]
                0.00    0.05   24312/105499      declare_function_name [13]
                0.00    0.10   47875/105499      make_decl_rtl [41]
[88]     1.5    0.00    0.21  105499         get_identifier [88]
                0.04    0.13  105499/106475      cpp_lookup [102]

I don't follow make_decl_rtl real well, but it looks like we're
calling get_identifier from there to set the DECL_ASSEMBLER_NAME of a
decl - and that name often winds up being the same as the DECL_NAME,
but we go through get_identifier anyway.  I Could Be Wrong.

The calls from declare_function_name are completely avoidable; we are
looking up __FUNCTION__, etc. over and over again.  Shove 'em into
c_global_trees and forget about it...

And set_mangled_name_for_decl ... does not call get_identifier.  This
is an artifact of tailcall optimization.  It calls either mangle_decl
or build_decl_overload_real (depending on -fnew-abi), both of which
end by tailcalling get_identifier.  This is to set DECL_ASSEMBLER_NAME
and looks unavoidable - at least at first glance.

zw

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]