This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
- From: Andrew Pinski <pinskia at gmail dot com>
- To: "Yangfei (Felix)" <felix dot yang at huawei dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Mon, 7 Mar 2016 22:54:25 -0800
- Subject: Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
- Authentication-results: sourceware.org; auth=none
- References: <DA41BE1DDCA941489001C7FBD7A8820E83858294 at szxema507-mbx dot china dot huawei dot com> <CA+=Sn1nED6fpgwCRt+rA8Rzy_zHnkb6+UuCEtxRT+qu8413qSQ at mail dot gmail dot com> <DA41BE1DDCA941489001C7FBD7A8820E838582C3 at szxema507-mbx dot china dot huawei dot com>
On Mon, Mar 7, 2016 at 8:12 PM, Yangfei (Felix) <felix.yang@huawei.com> wrote:
>> On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix) <felix.yang@huawei.com> wrote:
>> > Hi,
>> >
>> > As discussed in LKML:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, the
>> cost of changing a cache line
>> > from shared to exclusive state can be significant on aarch64 cores,
>> especially when this is triggered by an exclusive store, since it may
>> > result in having to retry the transaction.
>> > This patch makes use of the "prfm PSTL1STRM" instruction to prefetch
>> cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic
>> routines.
>> > Bootstrapped on AArch64 server, is it OK?
>>
>>
>> I don't think this is a good thing in general. For an example on ThunderX, the
>> prefetch just adds a cycle for no benefit. This really depends on the
>> micro-architecture of the core and how LDXR/STXR are
>> implemented. So after this patch, it will slow down ThunderX.
>>
>> Thanks,
>> Andrew Pinski
>>
>
> Hi Andrew,
>
> I am not quite clear about the ThunderX micro-arch. But, Yes, I agree it depends on the micro-architecture of the core.
> As the mentioned kernel patch is merged upstream, I think the added prefetch instruction in atomic routines is good for most of AArch64 cores in the market.
> If it does nothing good for ThunderX, then how about adding some checking here? I mean disabling the the generation of the prfm if we are tuning for ThunderX.
No it is not just not do any good, it actually causes worse
performance for ThunderX. How about only doing it for the
micro-architecture where it helps and also not do it for generic since
it hurts ThunderX so much.
Thanks,
Andrew
>
> Thanks,
> Felix