This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops

From: James Greenhalgh <james dot greenhalgh at arm dot com>
To: Andrew Pinski <pinskia at gmail dot com>
Cc: "Yangfei (Felix)" <felix dot yang at huawei dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
Date: Tue, 15 Mar 2016 15:31:30 +0000
Subject: Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
Authentication-results: sourceware.org; auth=none
References: <DA41BE1DDCA941489001C7FBD7A8820E83858294 at szxema507-mbx dot china dot huawei dot com> <CA+=Sn1nED6fpgwCRt+rA8Rzy_zHnkb6+UuCEtxRT+qu8413qSQ at mail dot gmail dot com> <DA41BE1DDCA941489001C7FBD7A8820E838582C3 at szxema507-mbx dot china dot huawei dot com> <CA+=Sn1=+MoUFXuTX-b8N967W6Gr-Qufuv2bGQy-cjjqve7BxYA at mail dot gmail dot com>

On Mon, Mar 07, 2016 at 10:54:25PM -0800, Andrew Pinski wrote:
> On Mon, Mar 7, 2016 at 8:12 PM, Yangfei (Felix) <felix.yang@huawei.com> wrote:
> >> On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix) <felix.yang@huawei.com> wrote:
> >> > Hi,
> >> >
> >> >     As discussed in LKML:
> >> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, the
> >> cost of changing a cache line
> >> >     from shared to exclusive state can be significant on aarch64 cores,
> >> especially when this is triggered by an exclusive store, since it may
> >> >     result in having to retry the transaction.
> >> >     This patch makes use of the "prfm PSTL1STRM" instruction to prefetch
> >> cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic
> >> routines.
> >> >     Bootstrapped on AArch64 server, is it OK?
> >>
> >>
> >> I don't think this is a good thing in general.  For an example on ThunderX, the
> >> prefetch just adds a cycle for no benefit.  This really depends on the
> >> micro-architecture of the core and how LDXR/STXR are
> >> implemented.   So after this patch, it will slow down ThunderX.
> >>
> >> Thanks,
> >> Andrew Pinski
> >>
> >
> > Hi Andrew,
> >
> >    I am not quite clear about the ThunderX micro-arch.  But, Yes, I agree
> >    it depends on the micro-architecture of the core.  As the mentioned
> >    kernel patch is merged upstream, I think the added prefetch instruction
> >    in atomic routines is good for most of AArch64 cores in the market.  If
> >    it does nothing good for ThunderX, then how about adding some checking
> >    here?  I mean disabling the the generation of the prfm if we are tuning
> >    for ThunderX.
> 
> No it is not just not do any good, it actually causes worse
> performance for ThunderX.  How about only doing it for the
> micro-architecture where it helps and also not do it for generic since
> it hurts ThunderX so much.

This should be a GCC 7 patch at this point, which should give us some time
to talk through whether we want this patch or not.

How bad is this for ThunderX - upthread you said one cycle penalty, but here
you suggest it hurts ThunderX more? Note that the prefetch is outside of
the LDXR/STXR loop.

Thanks,
James

References:
- [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
  - From: Yangfei (Felix)
- Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
  - From: Andrew Pinski
- Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
  - From: Yangfei (Felix)
- Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
  - From: Andrew Pinski

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]