This is the mail archive of the
mailing list for the GCC project.
Re: LTO IPA inline decisions in GCC trunk.
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: Venkataramanan Kumar <venkataramanan dot kumar at linaro dot org>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, gcc at gcc dot gnu dot org, Richard Biener <richard dot guenther at gmail dot com>, Christophe Lyon <christophe dot lyon at linaro dot org>
- Date: Thu, 6 Nov 2014 11:09:40 +0100
- Subject: Re: LTO IPA inline decisions in GCC trunk.
- Authentication-results: sourceware.org; auth=none
- References: <CAJK_mQ0Fad18r2VDYx3LC=+uuYh99anaqMQnsXUDR3tDsXrZ8w at mail dot gmail dot com>
> Hi Honza,
> I experimented building Coremark with both PGO and LTO at -O3 level on
> Aarch64 machine. First I generated profiles using the recommended seeds in
> Coremark's readme.txt. Then compiled again with -O3 -flto and -fprofile-use.
> I tried using GCC Linaro compiler (september) which is based on FSF 4.9 and
> GCC trunk 30-sep-2014.
> With linaro compiler perf events show 5% less instruction counts compared
> to the GCC trunk version I used.
> I looked at the generated code and seeing that IPA inlining have changed
> between linaro and trunk.
> Linaro compiler does not seem to inline a function called "crcu32", but
> trunk inlines it but does not inline "crcu16. Also trunk does not detect an
> IPA indirect inlining on a function called "cmp_complex".
Can you please compile with -fdump-ipa-inline-details -fdump-tree-release_ssa and send
me the dumps from both compilers? It should not be that hard to debug this.
> The number of partitions for ltrans is 3 in Linaro compiler and reduced to
> 2 in trunk.
> Eyeballing the dump it seems --param max-inline-insns-auto limit reached
> and hence deciding not to inline some functions. I tried increasing this
> limit from 40 to 45, 50 and 100. But is not helping in inlining "crcu32" in
> trunk, but inlines "cmp_complex" when set to limit set 45. But this is not
> reducing the instruction count.
> With Linaro compiler I tried to manually not to inline crcu16. Now Linaro
> compiler behaves in same way as trunk. It inlines crcu32, crcu16 is not
> inlined and instruction count increases.
> So inlining "crcu16", seem to increasing the instruction counts in trunk. I
> tried to latest trunk on X86_64 machine only and inlining behavior is same
> to the trunk version I used in Aarch64.
> LTO may not be best thing to try on Coremark, but just wanted to check if
> trunk (5.0) is better compared to GCC 4.9.
> Can you suggest where should I look in GCC to see why these inline
> decisions changes in trunk? Also compared to FSF 4.9, inline size
> calculation in IPA have changed now in trunk?
One important change for mainline compared to 4.9 is that with profile feedback
it can now bypass max-inline-insns-single/auto limits.
This is change I did in early stage1
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01110.html and I wanted to see if
there are any testcases. I think we may make more selective decisions about
what call is considered hot in this case (our current cgraph_maybe_hot_edge_p
is very conservative).
So in your case the main problem seems to be not inlining crcu16? Of course the
change above does not directly explain it, but perhaps some expensive inlining
early in the decision stage prevents useful inlining later..
> Please advise.