This is the mail archive of the
mailing list for the GCC project.
LTO IPA inline decisions in GCC trunk.
- From: Venkataramanan Kumar <venkataramanan dot kumar at linaro dot org>
- To: Jan Hubicka <hubicka at ucw dot cz>, gcc at gcc dot gnu dot org, Richard Biener <richard dot guenther at gmail dot com>, Christophe Lyon <christophe dot lyon at linaro dot org>
- Date: Thu, 6 Nov 2014 13:43:19 +0530
- Subject: LTO IPA inline decisions in GCC trunk.
- Authentication-results: sourceware.org; auth=none
I experimented building Coremark with both PGO and LTO at -O3 level on
Aarch64 machine. First I generated profiles using the recommended
seeds in Coremark's readme.txt. Then compiled again with -O3 -flto and
I tried using GCC Linaro compiler (september) which is based on FSF
4.9 and GCC trunk 30-sep-2014.
With linaro compiler perf events show 5% less instruction counts
compared to the GCC trunk version I used.
I looked at the generated code and seeing that IPA inlining have
changed between linaro and trunk.
Linaro compiler does not seem to inline a function called "crcu32",
but trunk inlines it but does not inline "crcu16. Also trunk does not
detect an IPA indirect inlining on a function called "cmp_complex".
The number of partitions for ltrans is 3 in Linaro compiler and
reduced to 2 in trunk.
Eyeballing the dump it seems --param max-inline-insns-auto limit
reached and hence deciding not to inline some functions. I tried
increasing this limit from 40 to 45, 50 and 100. But is not helping in
inlining "crcu32" in trunk, but inlines "cmp_complex" when set to
limit set 45. But this is not reducing the instruction count.
With Linaro compiler I tried to manually not to inline crcu16. Now
Linaro compiler behaves in same way as trunk. It inlines crcu32,
crcu16 is not inlined and instruction count increases.
So inlining "crcu16", seem to increasing the instruction counts in
trunk. I tried to latest trunk on X86_64 machine only and inlining
behavior is same to the trunk version I used in Aarch64.
LTO may not be best thing to try on Coremark, but just wanted to check
if trunk (5.0) is better compared to GCC 4.9.
Can you suggest where should I look in GCC to see why these inline
decisions changes in trunk? Also compared to FSF 4.9, inline size
calculation in IPA have changed now in trunk?