This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: gcc 3.5 integration branch proposal
> Nick Burrett <nick@dsvr.net> writes:
> > There's no harm in that. I have a port of GCC 3.3.3 running on a
> > 200MHz StrongARM that takes over 6 minutes to compile the following:
> >
> > #include <iostream>
> >
> > int main (void)
> > {
> > std::cout << "Hello World" << std::endl;
> > return 0;
> > }
> >
> > GCC 2.95.4 compiled the same application on the same hardware in
> > around 20-30 seconds.
>
> I compiled GCC 3.4-to-be with profiling instrumentation and ran it
> against this test case. The test takes 2.7 seconds on my roughly
> two-year-old Athlon, which I think is far too slow; it should be
> <0.01s on this hardware. Without benefit of PCH or other such
> cleverness.
>
> Profiling results are interesting. First, the times are nearly
> identical at -O2 and -fsyntax-only. This is natural, there really
> isn't anything here to optimize, but it's worth noting. Almost all
> the time is in the C++ front end. Here's the top of the flat profile
> (-fsyntax-only):
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 4.40 0.67 0.67 661 0.00 0.00 store_bindings
> 2.95 1.12 0.45 372916 0.00 0.00 ggc_alloc
> 2.95 1.57 0.45 129400 0.00 0.00 make_node
> 2.50 1.95 0.38 193075 0.00 0.00 _cpp_lex_direct
> 2.36 2.31 0.36 14689 0.00 0.00 grokdeclarator
> 2.30 2.66 0.35 103752 0.00 0.00 memset
> 2.23 3.00 0.34 96832 0.00 0.00 ht_lookup
> 1.90 3.29 0.29 365671 0.00 0.00 htab_find_slot_with_hash
> 1.90 3.58 0.29 89353 0.00 0.00 walk_tree
> 1.90 3.87 0.29 81277 0.00 0.00 _int_malloc
>
> store_bindings potentially does a tremendous amount of work: it (in
> conjunction with its sole caller, maybe_push_to_top_level) temporarily
> unwinds the current scope stack, which entails modifying the data
> structure for every identifier declared in the program. Since the
> program declares some 8,000 identifiers, you can see why a function
> that's called only 661 times ends up at the top of the profile.
>
> I am not sure why ggc_alloc comes in second; checking is disabled so
My experience from oprofiling is, that ggc_alloc/garbage
collector/memset is where all our cache faults go, so they end up high
in profiles even when amount of work looks small.
It is not really ggc_alloc fault, rather it is the fact that we use too
much of it.
> it isn't doing tons and tons of memset() operations or anything. The
> time spent in make_node is, I suspect, largely due to the inlined
> memset in there. Memset itself is being called mostly by [x]calloc,
> and most of *those* calls trace back to walk_tree_without_duplicates
> and/or for_each_template_parm, via htab_create_alloc. Those hash
> tables are a kludge to prevent exponential time consumption in certain
> algorithms; it would be nice if they weren't necessary.
>
> I think there's room for some easy speedups here, and I think they
> could get into 3.4 if developed promptly. Let's see some patches.
I am working on some of these :)
One other major stopper you don't see on your profile is
for_each_template_parm that can eat about 10% when compiling Gerald's
application with optimizing.
Honza
>
> zw