LTO IPA inline decisions in GCC trunk.
Thu Nov 6 08:13:00 GMT 2014
I experimented building Coremark with both PGO and LTO at -O3 level on
Aarch64 machine. First I generated profiles using the recommended
seeds in Coremark's readme.txt. Then compiled again with -O3 -flto and
I tried using GCC Linaro compiler (september) which is based on FSF
4.9 and GCC trunk 30-sep-2014.
With linaro compiler perf events show 5% less instruction counts
compared to the GCC trunk version I used.
I looked at the generated code and seeing that IPA inlining have
changed between linaro and trunk.
Linaro compiler does not seem to inline a function called "crcu32",
but trunk inlines it but does not inline "crcu16. Also trunk does not
detect an IPA indirect inlining on a function called "cmp_complex".
The number of partitions for ltrans is 3 in Linaro compiler and
reduced to 2 in trunk.
Eyeballing the dump it seems --param max-inline-insns-auto limit
reached and hence deciding not to inline some functions. I tried
increasing this limit from 40 to 45, 50 and 100. But is not helping in
inlining "crcu32" in trunk, but inlines "cmp_complex" when set to
limit set 45. But this is not reducing the instruction count.
With Linaro compiler I tried to manually not to inline crcu16. Now
Linaro compiler behaves in same way as trunk. It inlines crcu32,
crcu16 is not inlined and instruction count increases.
So inlining "crcu16", seem to increasing the instruction counts in
trunk. I tried to latest trunk on X86_64 machine only and inlining
behavior is same to the trunk version I used in Aarch64.
LTO may not be best thing to try on Coremark, but just wanted to check
if trunk (5.0) is better compared to GCC 4.9.
Can you suggest where should I look in GCC to see why these inline
decisions changes in trunk? Also compared to FSF 4.9, inline size
calculation in IPA have changed now in trunk?
More information about the Gcc