This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: C++ optimization: compile time + memory consumption regressionon gcc3.3 branch


On Fri, 28 Feb 2003, Michael Matz wrote:

> Hi,
>
> On Fri, 28 Feb 2003, Karel Gardas wrote:
>
> > > If not, just include the .ii in the bug report, and hope and pray
> > > someone else will do the work.
> >
> > Pray will certainly help, but I'll try to get some numbers for you.
> > Anyway, I hope someone already described how to compile gcc with profiling
> > information so I'll be able to find it in the archive...
>
> When I want to do something like this I do the following:
> - create a preprocessed version of the source in question, note the
>   option with which it exhibits the behaviour.
> - checkout GCC of the interesting version somewhere (/src/gcc)
>
> % cd /src/
> % mkdir devel inst; cd devel
> % CFLAGS="-g -pg" ../gcc/configure --prefix=/src/inst \
>   --enable-languages=c,c++
> % make -j 8
>
> (note: _not_ bootstrapping;  often I also forget the setting of CFLAGS
> before configure.  In that case I usually just edit the top-level Makefile
> (search for "O2"))
>
> Now there is a profilable /src/devel/gcc/cc1plus (and cc1), ergo:
> % cd /src; cp <sourcecode>.ii .
> % ./devel/gcc/cc1plus [all-the-options] <sourcecode>.ii
> % gprof ./devel/gcc/cc1plus
>

Thanks to these instructions, I've been able to obtain some numbers for
you.  The top of gprof output looks (command-line looks:
~/cvs/gcc/obj/gcc/cc1plus -O2  -Wall -fpermissive   -DPIC -fPIC
security/csiv2_impl.ii -o security/csiv2_impl.pic.o)

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ks/call  Ks/call  name
 32.39    370.58   370.58 737172489     0.00     0.00  fixup_var_refs_1
 30.24    716.59   346.01   177958     0.00     0.00  fixup_var_refs_insns
 15.41    892.87   176.28 737172497     0.00     0.00  fixup_var_refs_insn
  4.54    944.81    51.94 242350577     0.00     0.00  reg_mentioned_p
  2.63    974.88    30.07    16240     0.00     0.00  fixup_var_refs
  1.45    991.47    16.59 245604291     0.00     0.00  rtx_equal_p
  0.79   1000.48     9.01    32243     0.00     0.00  clear_table
  0.44   1005.48     5.00  1380529     0.00     0.00  gt_ggc_mx_lang_tree_node
  0.44   1010.47     4.99  2354362     0.00     0.00  walk_tree
  0.36   1014.57     4.10 65189450     0.00     0.00  walk_fixup_memory_subreg
  0.35   1018.63     4.06 25001432     0.00     0.00  ggc_alloc
  0.28   1021.83     3.20 39785726     0.00     0.00  ggc_set_mark
  0.27   1024.88     3.05  1531041     0.00     0.00  emit_insn
  0.25   1027.71     2.83     2416     0.00     0.00  init_alias_analysis
  0.24   1030.46     2.75    61705     0.00     0.00  htab_traverse
  0.23   1033.09     2.63      588     0.00     0.00  loop_regs_scan
  0.23   1035.67     2.58 17785018     0.00     0.00  comptypes
  0.20   1037.91     2.24 18300030     0.00     0.00  single_set_2
  0.18   1039.99     2.08   260571     0.00     0.00  sbitmap_union_of_diff_cg
  0.16   1041.86     1.87     2302     0.00     0.00  scan_loop
  0.15   1043.59     1.73   122478     0.00     0.00  alloc_page
  0.15   1045.25     1.66 40305705     0.00     0.00  lookup_page_table_entry
  0.15   1046.91     1.66   421165     0.00     0.00  flow_delete_block_noexpunge
  0.14   1048.48     1.57 21379508     0.00     0.00  mark_local_for_remap_r
  0.13   1049.98     1.50    79976     0.00     0.00  record_reg_classes
  0.13   1051.44     1.46  8225894     0.00     0.00  htab_find_slot_with_hash
  0.12   1052.86     1.42  1372240     0.00     0.00  expand_expr
  0.11   1054.14     1.28 29537527     0.00     0.00  cp_type_quals
  0.11   1055.41     1.27   496400     0.00     0.00  dfs_walk_real
  0.10   1056.53     1.12    24058     0.00     0.00  compute_transp
  0.10   1057.64     1.11  2884904     0.00     0.00  splay_tree_splay_helper
  0.09   1058.71     1.07  9389407     0.00     0.00  copy_node

> Before and after means simply once with a checkout of the fast version,
> and once with the slow one.  So you can identify the bottleneck.  In your
> case expand needs excessively long, so I guess simply looking at the
> profile of that one is enough to see the bottleneck.

Yes, I hope profiling of gcc3.2.2 will be useless. I have whole output of
gprof compressed on my disc so if anyone is interested I can provide it on
direct request (~300kB bzip2 compressed file). Now I'll try to binary
search gcc-3_3-branch to find the problematic patch. Anyway if you find
the problematic patch by looking into gprof output above, please let me
know to save my time.

Anything other what should I try?

Thanks,

Karel
--
Karel Gardas                  kgardas at objectsecurity dot com
ObjectSecurity Ltd.           http://www.objectsecurity.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]