This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: gcc 3.5 integration branch proposal

From: Jan Hubicka <hubicka at ucw dot cz>
To: Zack Weinberg <zack at codesourcery dot com>
Cc: Nick Burrett <nick at dsvr dot net>,Gabriel Dos Reis <gdr at integrable-solutions dot net>,Marc Espie <espie at quatramaran dot ens dot fr>, geoffk at apple dot com,gcc at gcc dot gnu dot org
Date: Tue, 20 Jan 2004 11:08:39 +0100
Subject: Re: gcc 3.5 integration branch proposal
References: <82D6F34E-4306-11D8-BDBD-000A95B1F520@apple.com> <20040110154129.GA28152@disaster.jaj.com> <C97CAA9B-452F-11D8-94EA-000A95B1F520@apple.com> <1073935323.3458.42.camel@minax.codesourcery.com> <F9B61B7C-4558-11D8-939E-0030657EA24A@apple.com> <1073951351.3458.162.camel@minax.codesourcery.com> <20040119013113.044D74895@quatramaran.ens.fr> <m3ad4kzp8o.fsf@uniton.integrable-solutions.net> <400BB40B.4070101@dsvr.net> <871xpvp9d7.fsf@egil.codesourcery.com>

> Nick Burrett <nick@dsvr.net> writes:
> > There's no harm in that.  I have a port of GCC 3.3.3 running on a
> > 200MHz StrongARM that takes over 6 minutes to compile the following:
> >
> >     #include <iostream>
> >
> >     int main (void)
> >     {
> >       std::cout << "Hello World" << std::endl;
> >       return 0;
> >     }
> >
> > GCC 2.95.4 compiled the same application on the same hardware in
> > around 20-30 seconds.
> 
> I compiled GCC 3.4-to-be with profiling instrumentation and ran it
> against this test case.  The test takes 2.7 seconds on my roughly
> two-year-old Athlon, which I think is far too slow; it should be
> <0.01s on this hardware.  Without benefit of PCH or other such
> cleverness.
> 
> Profiling results are interesting.  First, the times are nearly
> identical at -O2 and -fsyntax-only.  This is natural, there really
> isn't anything here to optimize, but it's worth noting.  Almost all
> the time is in the C++ front end.  Here's the top of the flat profile
> (-fsyntax-only):
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total           
>  time   seconds   seconds    calls   s/call   s/call  name    
>   4.40      0.67     0.67      661     0.00     0.00  store_bindings
>   2.95      1.12     0.45   372916     0.00     0.00  ggc_alloc
>   2.95      1.57     0.45   129400     0.00     0.00  make_node
>   2.50      1.95     0.38   193075     0.00     0.00  _cpp_lex_direct
>   2.36      2.31     0.36    14689     0.00     0.00  grokdeclarator
>   2.30      2.66     0.35   103752     0.00     0.00  memset
>   2.23      3.00     0.34    96832     0.00     0.00  ht_lookup
>   1.90      3.29     0.29   365671     0.00     0.00  htab_find_slot_with_hash
>   1.90      3.58     0.29    89353     0.00     0.00  walk_tree
>   1.90      3.87     0.29    81277     0.00     0.00  _int_malloc
> 
> store_bindings potentially does a tremendous amount of work: it (in
> conjunction with its sole caller, maybe_push_to_top_level) temporarily
> unwinds the current scope stack, which entails modifying the data
> structure for every identifier declared in the program.  Since the
> program declares some 8,000 identifiers, you can see why a function
> that's called only 661 times ends up at the top of the profile.
> 
> I am not sure why ggc_alloc comes in second; checking is disabled so

My experience from oprofiling is, that ggc_alloc/garbage
collector/memset is where all our cache faults go, so they end up high
in profiles even when amount of work looks small.
It is not really ggc_alloc fault, rather it is the fact that we use too
much of it.

> it isn't doing tons and tons of memset() operations or anything.  The
> time spent in make_node is, I suspect, largely due to the inlined
> memset in there.  Memset itself is being called mostly by [x]calloc,
> and most of *those* calls trace back to walk_tree_without_duplicates
> and/or for_each_template_parm, via htab_create_alloc.  Those hash
> tables are a kludge to prevent exponential time consumption in certain
> algorithms; it would be nice if they weren't necessary.
> 
> I think there's room for some easy speedups here, and I think they
> could get into 3.4 if developed promptly.  Let's see some patches.

I am working on some of these :)
One other major stopper you don't see on your profile is
for_each_template_parm that can eat about 10% when compiling Gerald's
application with optimizing.

Honza
> 
> zw

Follow-Ups:
- Re: gcc 3.5 integration branch proposal
  - From: Daniel Jacobowitz
- Re: gcc 3.5 integration branch proposal
  - From: Zack Weinberg

References:
- Re: gcc 3.5 integration branch proposal
  - From: Geoffrey Keating
- Re: gcc 3.5 integration branch proposal
  - From: Phil Edwards
- Re: gcc 3.5 integration branch proposal
  - From: Geoffrey Keating
- Re: gcc 3.5 integration branch proposal
  - From: Mark Mitchell
- Re: gcc 3.5 integration branch proposal
  - From: Geoff Keating
- Re: gcc 3.5 integration branch proposal
  - From: Mark Mitchell
- Re: gcc 3.5 integration branch proposal
  - From: Marc Espie
- Re: gcc 3.5 integration branch proposal
  - From: Gabriel Dos Reis
- Re: gcc 3.5 integration branch proposal
  - From: Nick Burrett
- Re: gcc 3.5 integration branch proposal
  - From: Zack Weinberg

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]