This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Andrew Pinski <pinskia at gmail dot com>
- Cc: "Yangfei (Felix)" <felix dot yang at huawei dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 15 Mar 2016 15:31:30 +0000
- Subject: Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
- Authentication-results: sourceware.org; auth=none
- References: <DA41BE1DDCA941489001C7FBD7A8820E83858294 at szxema507-mbx dot china dot huawei dot com> <CA+=Sn1nED6fpgwCRt+rA8Rzy_zHnkb6+UuCEtxRT+qu8413qSQ at mail dot gmail dot com> <DA41BE1DDCA941489001C7FBD7A8820E838582C3 at szxema507-mbx dot china dot huawei dot com> <CA+=Sn1=+MoUFXuTX-b8N967W6Gr-Qufuv2bGQy-cjjqve7BxYA at mail dot gmail dot com>
On Mon, Mar 07, 2016 at 10:54:25PM -0800, Andrew Pinski wrote:
> On Mon, Mar 7, 2016 at 8:12 PM, Yangfei (Felix) <felix.yang@huawei.com> wrote:
> >> On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix) <felix.yang@huawei.com> wrote:
> >> > Hi,
> >> >
> >> > As discussed in LKML:
> >> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, the
> >> cost of changing a cache line
> >> > from shared to exclusive state can be significant on aarch64 cores,
> >> especially when this is triggered by an exclusive store, since it may
> >> > result in having to retry the transaction.
> >> > This patch makes use of the "prfm PSTL1STRM" instruction to prefetch
> >> cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic
> >> routines.
> >> > Bootstrapped on AArch64 server, is it OK?
> >>
> >>
> >> I don't think this is a good thing in general. For an example on ThunderX, the
> >> prefetch just adds a cycle for no benefit. This really depends on the
> >> micro-architecture of the core and how LDXR/STXR are
> >> implemented. So after this patch, it will slow down ThunderX.
> >>
> >> Thanks,
> >> Andrew Pinski
> >>
> >
> > Hi Andrew,
> >
> > I am not quite clear about the ThunderX micro-arch. But, Yes, I agree
> > it depends on the micro-architecture of the core. As the mentioned
> > kernel patch is merged upstream, I think the added prefetch instruction
> > in atomic routines is good for most of AArch64 cores in the market. If
> > it does nothing good for ThunderX, then how about adding some checking
> > here? I mean disabling the the generation of the prfm if we are tuning
> > for ThunderX.
>
> No it is not just not do any good, it actually causes worse
> performance for ThunderX. How about only doing it for the
> micro-architecture where it helps and also not do it for generic since
> it hurts ThunderX so much.
This should be a GCC 7 patch at this point, which should give us some time
to talk through whether we want this patch or not.
How bad is this for ThunderX - upthread you said one cycle penalty, but here
you suggest it hurts ThunderX more? Note that the prefetch is outside of
the LDXR/STXR loop.
Thanks,
James