This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
- From: James Greenhalgh <james dot greenhalgh at arm dot com>
- To: Andrew Pinski <pinskia at gmail dot com>
- Cc: "Yangfei (Felix)" <felix dot yang at huawei dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, <nd at arm dot com>
- Date: Fri, 27 May 2016 12:29:06 +0100
- Subject: Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <DA41BE1DDCA941489001C7FBD7A8820E83858294 at szxema507-mbx dot china dot huawei dot com> <CA+=Sn1nED6fpgwCRt+rA8Rzy_zHnkb6+UuCEtxRT+qu8413qSQ at mail dot gmail dot com> <DA41BE1DDCA941489001C7FBD7A8820E838582C3 at szxema507-mbx dot china dot huawei dot com> <CA+=Sn1=+MoUFXuTX-b8N967W6Gr-Qufuv2bGQy-cjjqve7BxYA at mail dot gmail dot com> <20160315153130 dot GA21272 at arm dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
On Tue, Mar 15, 2016 at 03:31:30PM +0000, James Greenhalgh wrote:
> On Mon, Mar 07, 2016 at 10:54:25PM -0800, Andrew Pinski wrote:
> > On Mon, Mar 7, 2016 at 8:12 PM, Yangfei (Felix) <felix.yang@huawei.com> wrote:
> > >> On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix) <felix.yang@huawei.com> wrote:
> > >> > Hi,
> > >> >
> > >> > As discussed in LKML:
> > >> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, the
> > >> cost of changing a cache line
> > >> > from shared to exclusive state can be significant on aarch64 cores,
> > >> especially when this is triggered by an exclusive store, since it may
> > >> > result in having to retry the transaction.
> > >> > This patch makes use of the "prfm PSTL1STRM" instruction to prefetch
> > >> cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic
> > >> routines.
> > >> > Bootstrapped on AArch64 server, is it OK?
> > >>
> > >>
> > >> I don't think this is a good thing in general. For an example on ThunderX, the
> > >> prefetch just adds a cycle for no benefit. This really depends on the
> > >> micro-architecture of the core and how LDXR/STXR are
> > >> implemented. So after this patch, it will slow down ThunderX.
> > >>
> > >> Thanks,
> > >> Andrew Pinski
> > >>
> > >
> > > Hi Andrew,
> > >
> > > I am not quite clear about the ThunderX micro-arch. But, Yes, I agree
> > > it depends on the micro-architecture of the core. As the mentioned
> > > kernel patch is merged upstream, I think the added prefetch instruction
> > > in atomic routines is good for most of AArch64 cores in the market. If
> > > it does nothing good for ThunderX, then how about adding some checking
> > > here? I mean disabling the the generation of the prfm if we are tuning
> > > for ThunderX.
> >
> > No it is not just not do any good, it actually causes worse
> > performance for ThunderX. How about only doing it for the
> > micro-architecture where it helps and also not do it for generic since
> > it hurts ThunderX so much.
>
> This should be a GCC 7 patch at this point, which should give us some time
> to talk through whether we want this patch or not.
>
> How bad is this for ThunderX - upthread you said one cycle penalty, but here
> you suggest it hurts ThunderX more? Note that the prefetch is outside of
> the LDXR/STXR loop.
Hi Andrew,
Did you have any further thoughts on the magnitude of the penalty you
would face on ThunderX?
Felix, I think if this is going to be expensive for some AArch64 platforms,
then the best way to progress this patch would be to add a new flag to
tune_flags. Something like AARCH64_EXTRA_TUNE_PREFETCH_LDREX. This would
allow targets which want the explicit prefetch to enable it.
Thanks,
James