This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: LTO IPA inline decisions in GCC trunk.


Hi Honza,

Thank you for the reply.

On 6 November 2014 15:39, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Hi Honza,
> Hello,
>>
>> I experimented building Coremark with both PGO and LTO at -O3 level on
>> Aarch64 machine.  First I generated profiles using the recommended seeds in
>> Coremark's readme.txt. Then compiled again with -O3 -flto and -fprofile-use.
>>
>> I tried using GCC Linaro compiler (september) which is based on FSF 4.9 and
>> GCC trunk 30-sep-2014.
>>
>> With linaro compiler perf events show 5% less instruction counts compared
>> to the GCC trunk version I used.
>>
>> I looked at the generated code and seeing that IPA inlining have changed
>> between linaro and trunk.
>>
>> Linaro compiler does not seem to inline a function called "crcu32", but
>> trunk inlines it but does not inline "crcu16. Also trunk does not detect an
>> IPA indirect inlining on a function called "cmp_complex".
>
> Can you please compile with -fdump-ipa-inline-details -fdump-tree-release_ssa and send
> me the dumps from both compilers? It should not be that hard to debug this.
>>

I have attached the  -fdump-ipa-inline-details and -fdump-tree-release
dumps from both the compilers. I earlier used -fdump-ipa-all-all for
my analysis,

>> The number of partitions for ltrans is 3 in Linaro compiler and reduced to
>> 2 in trunk.
>>
>> Eyeballing the dump it seems --param max-inline-insns-auto limit reached
>> and hence deciding not to inline some functions. I tried increasing this
>> limit from 40 to 45, 50 and 100. But is not helping in inlining "crcu32" in
>> trunk, but inlines "cmp_complex" when set to limit set 45. But this is not
>> reducing the instruction count.
>>
>> With Linaro compiler I tried to manually not to inline crcu16. Now Linaro
>> compiler behaves in same way as trunk. It inlines crcu32, crcu16 is not
>> inlined and instruction count increases.
>>
>> So inlining "crcu16", seem to increasing the instruction counts in trunk. I
>> tried to latest trunk on X86_64 machine only and inlining behavior is same
>> to the trunk version I used in Aarch64.
>>
>> LTO may not be best thing to try on Coremark, but just wanted to check if
>> trunk (5.0) is better compared to GCC 4.9.
>>
>> Can you suggest where should I look in GCC to see why these inline
>> decisions changes in trunk? Also compared to FSF 4.9, inline size
>> calculation in IPA have changed now in trunk?
>
> One important change for mainline compared to 4.9 is that with profile feedback
> it can now bypass max-inline-insns-single/auto limits.
>
> This is change I did in early stage1
> https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01110.html and I wanted to see if
> there are any testcases.  I think we may make more selective decisions about
> what call is considered hot in this case (our current cgraph_maybe_hot_edge_p
> is very conservative).

Yes these changes are not in Linaro compiler source code.

>
> So in your case the main problem seems to be not inlining crcu16? Of course the
> change above does not directly explain it, but perhaps some expensive inlining
> early in the decision stage prevents useful inlining later..

Okay I will explore on this.

>
> Honza
>>
>> Please advise.
>>
>> regards,
>> Venkat.

regards,
Venkat.

Attachment: lto-dumps.tar.xz
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]