This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Inlining and estimate_num_insns
- From: Jan Hubicka <jh at suse dot cz>
- To: Steven Bosscher <stevenb at suse dot de>
- Cc: Richard Guenther <richard dot guenther at gmail dot com>,Giovanni Bajo <giovannibajo at libero dot it>,Mark Mitchell <mark at codesourcery dot com>, gcc at gcc dot gnu dot org,Jan Hubicka <hubicka at ucw dot cz>
- Date: Tue, 1 Mar 2005 01:33:07 +0100
- Subject: Re: Inlining and estimate_num_insns
- References: <Pine.LNX.4.44.0502241350470.2297-100000@alwazn.tat.physik.uni-tuebingen.de> <200502280219.00001.stevenb@suse.de> <84fc9c0005022801255fc61758@mail.gmail.com> <200502282156.44751.stevenb@suse.de>
> On Monday 28 February 2005 10:25, Richard Guenther wrote:
> > > I can only wonder why we are having this discussion just after GCC 4.0
> > > was branched, while it was obvious already two years ago that inlining
> > > heuristics were going to be a difficult item with tree-ssa.
> >
> > There were of course complaints and discussions about this, and I even
> > tried to tweak inlining parameters once. See the audit trails of PR7863
> > and PR8704. There were people telling me "well in branch XYZ we do so much
> > better", as always, so I was not encouraged to persue this further.
> >
> > Anyway, I think we should try the patch on mainline and I'll plan to
> > re-submit it together with a 10% lowering of the inlining parameters
> > compared to 3.4 (this is conservative for the mean size change for C code,
> > for C++ we're still too high). I personally cannot afford to do so much
> > testing to please everyone.
>
> I tested your -fobey-inline patch a bit using the test case from PR8361.
> The run was still going after 3 minutes (without the flag it takes 20s)
> so I terminated it and took the following oprofile:
>
> CPU: Hammer, speed 1394.98 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 4000
> Counted DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000
> samples % samples % image name symbol name
> 4607300 78.7190 98784 79.4179 cc1plus cgraph_remove_edge
> 861258 14.7152 15308 12.3070 cc1plus cgraph_remove_node
> 60871 1.0400 999 0.8032 cc1plus ggc_set_mark
> 56907 0.9723 2054 1.6513 cc1plus cgraph_optimize
> 36513 0.6239 1132 0.9101 cc1plus cgraph_clone_inlined_nodes
> 29570 0.5052 843 0.6777 cc1plus cgraph_postorder
> 16187 0.2766 367 0.2951 cc1plus ggc_alloc_stat
> 7787 0.1330 97 0.0780 cc1plus gt_ggc_mx_cgraph_node
> 6851 0.1171 138 0.1109 cc1plus cgraph_edge
> 6671 0.1140 305 0.2452 cc1plus comptypes
> 5776 0.0987 95 0.0764 cc1plus gt_ggc_mx_cgraph_edge
> 5243 0.0896 93 0.0748 cc1plus gt_ggc_mx_lang_tree_node
>
> Honza, it seems the cgraph code needs whipping here.
I think I can shot down the cgraph_remove_node lazyness by simple
reference counting, but concerning removal of edges, only alternative I
see is going for vectors/doubly linked lists. I would still expect this
time to be dominated by later inlining/compation explossion so I would
not take that too seriously (unless proved otherwise by
cgraph_remove_edge being top on overall profile ;)
Honza
>
> Gr.
> Steven