This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: gcc 3.5 integration branch proposal


> Nick Burrett <nick@dsvr.net> writes:
> > There's no harm in that.  I have a port of GCC 3.3.3 running on a
> > 200MHz StrongARM that takes over 6 minutes to compile the following:
> >
> >     #include <iostream>
> >
> >     int main (void)
> >     {
> >       std::cout << "Hello World" << std::endl;
> >       return 0;
> >     }
> >
> > GCC 2.95.4 compiled the same application on the same hardware in
> > around 20-30 seconds.
> 
> I compiled GCC 3.4-to-be with profiling instrumentation and ran it
> against this test case.  The test takes 2.7 seconds on my roughly
> two-year-old Athlon, which I think is far too slow; it should be
> <0.01s on this hardware.  Without benefit of PCH or other such
> cleverness.
> 
> Profiling results are interesting.  First, the times are nearly
> identical at -O2 and -fsyntax-only.  This is natural, there really
> isn't anything here to optimize, but it's worth noting.  Almost all
> the time is in the C++ front end.  Here's the top of the flat profile
> (-fsyntax-only):
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls   s/call   s/call  name    
>   4.40      0.67     0.67      661     0.00     0.00  store_bindings
>   2.95      1.12     0.45   372916     0.00     0.00  ggc_alloc
>   2.95      1.57     0.45   129400     0.00     0.00  make_node
>   2.50      1.95     0.38   193075     0.00     0.00  _cpp_lex_direct
>   2.36      2.31     0.36    14689     0.00     0.00  grokdeclarator
>   2.30      2.66     0.35   103752     0.00     0.00  memset
>   2.23      3.00     0.34    96832     0.00     0.00  ht_lookup
>   1.90      3.29     0.29   365671     0.00     0.00  htab_find_slot_with_hash
>   1.90      3.58     0.29    89353     0.00     0.00  walk_tree
>   1.90      3.87     0.29    81277     0.00     0.00  _int_malloc
> 
> store_bindings potentially does a tremendous amount of work: it (in
> conjunction with its sole caller, maybe_push_to_top_level) temporarily
> unwinds the current scope stack, which entails modifying the data
> structure for every identifier declared in the program.  Since the
> program declares some 8,000 identifiers, you can see why a function
> that's called only 661 times ends up at the top of the profile.
> 
> I am not sure why ggc_alloc comes in second; checking is disabled so

My experience from oprofiling is, that ggc_alloc/garbage
collector/memset is where all our cache faults go, so they end up high
in profiles even when amount of work looks small.
It is not really ggc_alloc fault, rather it is the fact that we use too
much of it.

> it isn't doing tons and tons of memset() operations or anything.  The
> time spent in make_node is, I suspect, largely due to the inlined
> memset in there.  Memset itself is being called mostly by [x]calloc,
> and most of *those* calls trace back to walk_tree_without_duplicates
> and/or for_each_template_parm, via htab_create_alloc.  Those hash
> tables are a kludge to prevent exponential time consumption in certain
> algorithms; it would be nice if they weren't necessary.
> 
> I think there's room for some easy speedups here, and I think they
> could get into 3.4 if developed promptly.  Let's see some patches.

I am working on some of these :)
One other major stopper you don't see on your profile is
for_each_template_parm that can eat about 10% when compiling Gerald's
application with optimizing.

Honza
> 
> zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]