[PATCH][AArch64] Use LDP/STP in shrinkwrapping

Segher Boessenkool segher@kernel.crashing.org
Wed Jan 10 18:42:00 GMT 2018


On Tue, Jan 09, 2018 at 09:13:23PM -0800, Andrew Pinski wrote:
> On Tue, Jan 9, 2018 at 6:54 AM, Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Tue, Jan 09, 2018 at 12:23:42PM +0000, Wilco Dijkstra wrote:
> >> Segher Boessenkool wrote:
> >> > On Mon, Jan 08, 2018 at 0:25:47PM +0000, Wilco Dijkstra wrote:
> >> >> > Always pairing two registers together *also* degrades code quality.
> >> >>
> >> >> No, while it's not optimal, it means smaller code and fewer memory accesses.
> >> >
> >> > It means you execute *more* memory accesses.  Always.  This may be
> >> > sometimes hidden, sure.  I'm not saying you do not want more ldp's;
> >> > I'm saying this particular strategy is very far from ideal.
> >>
> >> No it means less since the number of memory accesses reduces (memory
> >> bandwidth may increase but that's not an issue).
> >
> > The problem is *more* memory accesses are executed at runtime.  Which is
> > why separate shrink-wrapping does what it does: to have *fewer* executed.
> > (It's not just the direct execution cost why that helps: more important
> > are latencies to dependent ops, microarchitectural traps, etc.).
> 
> On most micro-arch of AARCH64, having one LDP/STP will take just as
> long as one LDR/STR as long as it is on the same cache line.
> So having one LDP/STP compared to two LDR?STR is much better.  LDP/STP
> is considered one memory access really and that is where the confusion
> is coming from.  We are reducing the overall number of memory accesses
> or keeping it the same on that path.
> Hope this explanation allows you to understand why pairing does not
> degrade the code quality but improves it overall.

Of course I see that ldp is useful.  I don't think that this particular
way of forcing more pairs is a good idea.  Needs testing / benchmarking /
instrumentation, and we haven't seen any of that.

Forcing pairs before separate shrink-wrapping reduces the effectiveness
of the latter by a lot.


Segher



More information about the Gcc-patches mailing list