This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi Honza, Thank you for the reply. On 6 November 2014 15:39, Jan Hubicka <hubicka@ucw.cz> wrote: >> Hi Honza, > Hello, >> >> I experimented building Coremark with both PGO and LTO at -O3 level on >> Aarch64 machine. First I generated profiles using the recommended seeds in >> Coremark's readme.txt. Then compiled again with -O3 -flto and -fprofile-use. >> >> I tried using GCC Linaro compiler (september) which is based on FSF 4.9 and >> GCC trunk 30-sep-2014. >> >> With linaro compiler perf events show 5% less instruction counts compared >> to the GCC trunk version I used. >> >> I looked at the generated code and seeing that IPA inlining have changed >> between linaro and trunk. >> >> Linaro compiler does not seem to inline a function called "crcu32", but >> trunk inlines it but does not inline "crcu16. Also trunk does not detect an >> IPA indirect inlining on a function called "cmp_complex". > > Can you please compile with -fdump-ipa-inline-details -fdump-tree-release_ssa and send > me the dumps from both compilers? It should not be that hard to debug this. >> I have attached the -fdump-ipa-inline-details and -fdump-tree-release dumps from both the compilers. I earlier used -fdump-ipa-all-all for my analysis, >> The number of partitions for ltrans is 3 in Linaro compiler and reduced to >> 2 in trunk. >> >> Eyeballing the dump it seems --param max-inline-insns-auto limit reached >> and hence deciding not to inline some functions. I tried increasing this >> limit from 40 to 45, 50 and 100. But is not helping in inlining "crcu32" in >> trunk, but inlines "cmp_complex" when set to limit set 45. But this is not >> reducing the instruction count. >> >> With Linaro compiler I tried to manually not to inline crcu16. Now Linaro >> compiler behaves in same way as trunk. It inlines crcu32, crcu16 is not >> inlined and instruction count increases. >> >> So inlining "crcu16", seem to increasing the instruction counts in trunk. I >> tried to latest trunk on X86_64 machine only and inlining behavior is same >> to the trunk version I used in Aarch64. >> >> LTO may not be best thing to try on Coremark, but just wanted to check if >> trunk (5.0) is better compared to GCC 4.9. >> >> Can you suggest where should I look in GCC to see why these inline >> decisions changes in trunk? Also compared to FSF 4.9, inline size >> calculation in IPA have changed now in trunk? > > One important change for mainline compared to 4.9 is that with profile feedback > it can now bypass max-inline-insns-single/auto limits. > > This is change I did in early stage1 > https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01110.html and I wanted to see if > there are any testcases. I think we may make more selective decisions about > what call is considered hot in this case (our current cgraph_maybe_hot_edge_p > is very conservative). Yes these changes are not in Linaro compiler source code. > > So in your case the main problem seems to be not inlining crcu16? Of course the > change above does not directly explain it, but perhaps some expensive inlining > early in the decision stage prevents useful inlining later.. Okay I will explore on this. > > Honza >> >> Please advise. >> >> regards, >> Venkat. regards, Venkat.
Attachment:
lto-dumps.tar.xz
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |