[PATCH 4/5] shrink-wrap: Shrink-wrapping for separate components
Segher Boessenkool
segher@kernel.crashing.org
Mon Oct 10 22:23:00 GMT 2016
On Mon, Oct 10, 2016 at 03:21:31PM -0600, Jeff Law wrote:
> On 09/30/2016 04:34 AM, Segher Boessenkool wrote:
> >[ whoops, message too big, resending with the attachment compressed ]
> >
> >On Tue, Sep 27, 2016 at 03:14:51PM -0600, Jeff Law wrote:
> >>With transposition issue addressed, the only blocker I see are some
> >>simple testcases we can add to the suite. They don't have to be real
> >>extensive. And one motivating example for the list archives, ideally
> >>the glibc malloc case.
> >
> >And here is the malloc testcase.
> >
> >A very important (for performance) function is _int_malloc, which starts
> >with
> [ ... ]
> THanks. What I think is important to note with this example is the bits
> that were pushed into the path with the sysmalloc/alloc_perturb calls.
> That's an unlikely path.
alloc_perturb is a no-op, and inlined as such: as nothing :-)
> We have to extrapolate a bit from the assembly provided. In the not
> separately shrink-wrapped version, we have a full prologue of stores and
> two instances of a full epilogue (though only one ever executes) provided.
>
> With separate shrink wrapping the (presumably) very cold path where we
> error has virtually no prologue/epilogue. That's probably a nop from a
> performance standpoint.
>
> More interesting is the path where we call sysmalloc/alloc_perturb, it's
> a cold path, but not as cold as the error path. We save/restore 4 regs
> in that case. Rather than a full prologue/epilogue. So there's clearly
> a savings there, though again, via the expect it's a cold path.
>
> Where we have to extrapolate is the hot path. Presumably on the hot
> path we're saving/restoring ~4 fewer registers. I haven't verified
> that, but that is kindof the whole point here.
We save/restore just four registers total on the hot path. And yes,
that is the point :-)
The hot exit is
.L683:
ld 14,144(1)
ld 15,152(1)
ld 25,232(1)
ld 30,272(1)
addi 3,4,16
.L673:
addi 1,1,288
blr
so four GPR restores and no LR restore. Without separate shrink-wrapping
this was
.L641:
addi 3,21,16
b .L631
[ ... ]
.L631:
addi 1,1,288
ld 29,16(1)
ld 14,-144(1)
ld 15,-136(1)
ld 16,-128(1)
ld 17,-120(1)
ld 18,-112(1)
ld 19,-104(1)
ld 20,-96(1)
ld 21,-88(1)
ld 22,-80(1)
ld 23,-72(1)
ld 24,-64(1)
mtlr 29
ld 25,-56(1)
ld 26,-48(1)
ld 27,-40(1)
ld 28,-32(1)
ld 29,-24(1)
ld 30,-16(1)
ld 31,-8(1)
blr
(18 GPRs as well as LR).
I didn't show this path because there is a whole bunch of branches with
inline asm in the way.
The sysmalloc path was
.L635:
li 4,0
.L761:
addi 1,1,288
mr 3,14
ld 14,16(1)
ld 15,-136(1)
ld 16,-128(1)
ld 17,-120(1)
ld 18,-112(1)
ld 19,-104(1)
ld 20,-96(1)
ld 21,-88(1)
ld 22,-80(1)
ld 23,-72(1)
ld 24,-64(1)
ld 25,-56(1)
mtlr 14
ld 26,-48(1)
ld 14,-144(1)
ld 27,-40(1)
ld 28,-32(1)
ld 29,-24(1)
ld 30,-16(1)
ld 31,-8(1)
b sysmalloc
and now is
.L677:
mr 3,14
ld 15,152(1)
ld 14,144(1)
ld 25,232(1)
ld 30,272(1)
li 4,0
addi 1,1,288
b sysmalloc
I attach malloc.s.{no,yes}, I hope you can stomach that. Well you
can read HP-PA, heh.
Segher
-------------- next part --------------
A non-text attachment was scrubbed...
Name: malloc.s.no.gz
Type: application/x-gzip
Size: 40507 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20161010/81337273/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: malloc.s.yes.gz
Type: application/x-gzip
Size: 42479 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20161010/81337273/attachment-0001.bin>
More information about the Gcc-patches
mailing list