This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: C++ optimization: compile time + memory consumption regressionon gcc3.3 branch
- From: Karel Gardas <kgardas at objectsecurity dot com>
- To: GCC Mailing List <gcc at gcc dot gnu dot org>
- Date: Mon, 3 Mar 2003 11:30:29 +0100 (CET)
- Subject: Re: C++ optimization: compile time + memory consumption regressionon gcc3.3 branch
On Fri, 28 Feb 2003, Michael Matz wrote:
> Hi,
>
> On Fri, 28 Feb 2003, Karel Gardas wrote:
>
> > > If not, just include the .ii in the bug report, and hope and pray
> > > someone else will do the work.
> >
> > Pray will certainly help, but I'll try to get some numbers for you.
> > Anyway, I hope someone already described how to compile gcc with profiling
> > information so I'll be able to find it in the archive...
>
> When I want to do something like this I do the following:
> - create a preprocessed version of the source in question, note the
> option with which it exhibits the behaviour.
> - checkout GCC of the interesting version somewhere (/src/gcc)
>
> % cd /src/
> % mkdir devel inst; cd devel
> % CFLAGS="-g -pg" ../gcc/configure --prefix=/src/inst \
> --enable-languages=c,c++
> % make -j 8
>
> (note: _not_ bootstrapping; often I also forget the setting of CFLAGS
> before configure. In that case I usually just edit the top-level Makefile
> (search for "O2"))
>
> Now there is a profilable /src/devel/gcc/cc1plus (and cc1), ergo:
> % cd /src; cp <sourcecode>.ii .
> % ./devel/gcc/cc1plus [all-the-options] <sourcecode>.ii
> % gprof ./devel/gcc/cc1plus
>
Thanks to these instructions, I've been able to obtain some numbers for
you. The top of gprof output looks (command-line looks:
~/cvs/gcc/obj/gcc/cc1plus -O2 -Wall -fpermissive -DPIC -fPIC
security/csiv2_impl.ii -o security/csiv2_impl.pic.o)
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
32.39 370.58 370.58 737172489 0.00 0.00 fixup_var_refs_1
30.24 716.59 346.01 177958 0.00 0.00 fixup_var_refs_insns
15.41 892.87 176.28 737172497 0.00 0.00 fixup_var_refs_insn
4.54 944.81 51.94 242350577 0.00 0.00 reg_mentioned_p
2.63 974.88 30.07 16240 0.00 0.00 fixup_var_refs
1.45 991.47 16.59 245604291 0.00 0.00 rtx_equal_p
0.79 1000.48 9.01 32243 0.00 0.00 clear_table
0.44 1005.48 5.00 1380529 0.00 0.00 gt_ggc_mx_lang_tree_node
0.44 1010.47 4.99 2354362 0.00 0.00 walk_tree
0.36 1014.57 4.10 65189450 0.00 0.00 walk_fixup_memory_subreg
0.35 1018.63 4.06 25001432 0.00 0.00 ggc_alloc
0.28 1021.83 3.20 39785726 0.00 0.00 ggc_set_mark
0.27 1024.88 3.05 1531041 0.00 0.00 emit_insn
0.25 1027.71 2.83 2416 0.00 0.00 init_alias_analysis
0.24 1030.46 2.75 61705 0.00 0.00 htab_traverse
0.23 1033.09 2.63 588 0.00 0.00 loop_regs_scan
0.23 1035.67 2.58 17785018 0.00 0.00 comptypes
0.20 1037.91 2.24 18300030 0.00 0.00 single_set_2
0.18 1039.99 2.08 260571 0.00 0.00 sbitmap_union_of_diff_cg
0.16 1041.86 1.87 2302 0.00 0.00 scan_loop
0.15 1043.59 1.73 122478 0.00 0.00 alloc_page
0.15 1045.25 1.66 40305705 0.00 0.00 lookup_page_table_entry
0.15 1046.91 1.66 421165 0.00 0.00 flow_delete_block_noexpunge
0.14 1048.48 1.57 21379508 0.00 0.00 mark_local_for_remap_r
0.13 1049.98 1.50 79976 0.00 0.00 record_reg_classes
0.13 1051.44 1.46 8225894 0.00 0.00 htab_find_slot_with_hash
0.12 1052.86 1.42 1372240 0.00 0.00 expand_expr
0.11 1054.14 1.28 29537527 0.00 0.00 cp_type_quals
0.11 1055.41 1.27 496400 0.00 0.00 dfs_walk_real
0.10 1056.53 1.12 24058 0.00 0.00 compute_transp
0.10 1057.64 1.11 2884904 0.00 0.00 splay_tree_splay_helper
0.09 1058.71 1.07 9389407 0.00 0.00 copy_node
> Before and after means simply once with a checkout of the fast version,
> and once with the slow one. So you can identify the bottleneck. In your
> case expand needs excessively long, so I guess simply looking at the
> profile of that one is enough to see the bottleneck.
Yes, I hope profiling of gcc3.2.2 will be useless. I have whole output of
gprof compressed on my disc so if anyone is interested I can provide it on
direct request (~300kB bzip2 compressed file). Now I'll try to binary
search gcc-3_3-branch to find the problematic patch. Anyway if you find
the problematic patch by looking into gprof output above, please let me
know to save my time.
Anything other what should I try?
Thanks,
Karel
--
Karel Gardas kgardas at objectsecurity dot com
ObjectSecurity Ltd. http://www.objectsecurity.com