This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Speed/profile of gcc3.4
> Hi!
>
> To give some more data to the speed of g++ discussion I built a profiling
> compiler and ran it over the tramp3d.cpp testcase
> (http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/tramp3d.cpp.gz). Top
> on the (flat) profile are
>
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 7.85 24.64 24.64 71506294 0.00 0.00 ggc_alloc
> 3.37 35.23 10.59 75978109 0.00 0.00 htab_find_slot_with_hash
> 3.17 45.20 9.97 15526171 0.00 0.00 walk_tree
> 3.06 54.80 9.60 3895596 0.00 0.00 gt_ggc_mx_lang_tree_node
> 2.10 61.40 6.60 43652 0.00 0.00 fixup_var_refs_insns
> 2.07 67.91 6.51 116225166 0.00 0.00 ggc_set_mark
> 2.05 74.35 6.44 17854 0.00 0.00 init_alias_analysis
> 1.51 79.10 4.75 221721 0.00 0.00 htab_expand
> 1.47 83.71 4.61 1044 0.00 0.02 store_motion
> 1.17 87.37 3.66 8618 0.00 0.00 loop_regs_scan
> 1.14 90.96 3.59 13512961 0.00 0.00 fixup_var_refs_1
> 1.03 94.19 3.23 238610 0.00 0.00 compute_transp
> 1.03 97.41 3.22 2216398 0.00 0.00 emit_insn
> 0.94 100.37 2.96 27841267 0.00 0.00 note_stores
> 0.94 103.31 2.94 20724190 0.00 0.00 splay_tree_splay_helper
> 0.89 106.09 2.78 11630295 0.00 0.00 for_each_rtx
> 0.88 108.85 2.76 2243042 0.00 0.00 cse_insn
> 0.88 111.60 2.75 7037958 0.00 0.00 reg_scan_mark_refs
> 0.79 114.08 2.48 16796134 0.00 0.00 find_loads
> 0.71 116.31 2.23 129249560 0.00 0.00 bitmap_set_bit
> 0.68 118.45 2.14 6755763 0.00 0.00 count_reg_usage
> 0.67 120.55 2.10 58964152 0.00 0.00 find_reg_note
> 0.61 122.47 1.92 3829519 0.00 0.00 constrain_operands
> 0.60 124.35 1.88 13513006 0.00 0.00 fixup_var_refs_insn
> 0.59 126.19 1.84 22372827 0.00 0.00 mark_set_1
>
> Ugh.
>
> The htab_find_slot_with_hash stuff should maybe splitted up because
> it seems heavily overloaded. Also, do we use power-of-two hashtab sizes
> only? In this case we could save the costly division/modulo calculations.
Actually I did some profiling of this too and at least from Gerald's
testcase I concluded that wast majority of the hashtable uses come from
the for_each_template_parm. Jason mentioned that Mark plans to trim
down use of these. That should make it possible to shot this function
out of profiles completely.
Mark, do you made some progress on this? If not, I can try to do
something myself if you give me someguidelines.
>
> ggc_alloc - err - which collector do we use on default? The page or the zone collector?
> How do I select a different collector?
>
> For the page collector, inside ggc_alloc we should use __builtin_expect()
> for the entry==NULL || entry->num_free_objects == 0, also using a wrapper
> around ggc_alloc() with a __builtin_constant_p() could be used to speed up
> the order calculation. Also push_depth/push_by_depth could make use of
> __builtin_expect() and put the realloc out of line. Was the use of
> prefetch in ggc_pop_context benchmarked?
The builtin_expect tricks can be obsoletted by pusing profilebootstrap
to be used for porduction builds of the compiler. Perhaps we can do
some work to make it cheaper by producing the train run testcase that is
smaller than full libjava/libstdc++ build?
Honza
>
> Richard.
>
> --
> Richard Guenther <richard dot guenther at uni-tuebingen dot de>
> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/