This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc 3.5 integration branch proposal


Nick Burrett <nick@dsvr.net> writes:
> There's no harm in that.  I have a port of GCC 3.3.3 running on a
> 200MHz StrongARM that takes over 6 minutes to compile the following:
>
>     #include <iostream>
>
>     int main (void)
>     {
>       std::cout << "Hello World" << std::endl;
>       return 0;
>     }
>
> GCC 2.95.4 compiled the same application on the same hardware in
> around 20-30 seconds.

I compiled GCC 3.4-to-be with profiling instrumentation and ran it
against this test case.  The test takes 2.7 seconds on my roughly
two-year-old Athlon, which I think is far too slow; it should be
<0.01s on this hardware.  Without benefit of PCH or other such
cleverness.

Profiling results are interesting.  First, the times are nearly
identical at -O2 and -fsyntax-only.  This is natural, there really
isn't anything here to optimize, but it's worth noting.  Almost all
the time is in the C++ front end.  Here's the top of the flat profile
(-fsyntax-only):

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
  4.40      0.67     0.67      661     0.00     0.00  store_bindings
  2.95      1.12     0.45   372916     0.00     0.00  ggc_alloc
  2.95      1.57     0.45   129400     0.00     0.00  make_node
  2.50      1.95     0.38   193075     0.00     0.00  _cpp_lex_direct
  2.36      2.31     0.36    14689     0.00     0.00  grokdeclarator
  2.30      2.66     0.35   103752     0.00     0.00  memset
  2.23      3.00     0.34    96832     0.00     0.00  ht_lookup
  1.90      3.29     0.29   365671     0.00     0.00  htab_find_slot_with_hash
  1.90      3.58     0.29    89353     0.00     0.00  walk_tree
  1.90      3.87     0.29    81277     0.00     0.00  _int_malloc

store_bindings potentially does a tremendous amount of work: it (in
conjunction with its sole caller, maybe_push_to_top_level) temporarily
unwinds the current scope stack, which entails modifying the data
structure for every identifier declared in the program.  Since the
program declares some 8,000 identifiers, you can see why a function
that's called only 661 times ends up at the top of the profile.

I am not sure why ggc_alloc comes in second; checking is disabled so
it isn't doing tons and tons of memset() operations or anything.  The
time spent in make_node is, I suspect, largely due to the inlined
memset in there.  Memset itself is being called mostly by [x]calloc,
and most of *those* calls trace back to walk_tree_without_duplicates
and/or for_each_template_parm, via htab_create_alloc.  Those hash
tables are a kludge to prevent exponential time consumption in certain
algorithms; it would be nice if they weren't necessary.

I think there's room for some easy speedups here, and I think they
could get into 3.4 if developed promptly.  Let's see some patches.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]