[PATCH 4/5] shrink-wrap: Shrink-wrapping for separate components

Segher Boessenkool segher@kernel.crashing.org
Mon Oct 10 22:23:00 GMT 2016


On Mon, Oct 10, 2016 at 03:21:31PM -0600, Jeff Law wrote:
> On 09/30/2016 04:34 AM, Segher Boessenkool wrote:
> >[ whoops, message too big, resending with the attachment compressed ]
> >
> >On Tue, Sep 27, 2016 at 03:14:51PM -0600, Jeff Law wrote:
> >>With transposition issue addressed, the only blocker I see are some
> >>simple testcases we can add to the suite.  They don't have to be real
> >>extensive.  And one motivating example for the list archives, ideally
> >>the glibc malloc case.
> >
> >And here is the malloc testcase.
> >
> >A very important (for performance) function is _int_malloc, which starts
> >with
> [ ... ]
> THanks.  What I think is important to note with this example is the bits 
> that were pushed into the path with the sysmalloc/alloc_perturb calls. 
> That's an unlikely path.

alloc_perturb is a no-op, and inlined as such: as nothing :-)

> We have to extrapolate a bit from the assembly provided.  In the not 
> separately shrink-wrapped version, we have a full prologue of stores and 
> two instances of a full epilogue (though only one ever executes) provided.
> 
> With separate shrink wrapping the (presumably) very cold path where we 
> error has virtually no prologue/epilogue.  That's probably a nop from a 
> performance standpoint.
> 
> More interesting is the path where we call sysmalloc/alloc_perturb, it's 
> a cold path, but not as cold as the error path.  We save/restore 4 regs 
> in that case.  Rather than a full prologue/epilogue.  So there's clearly 
> a savings there, though again, via the expect it's a cold path.
> 
> Where we have to extrapolate is the hot path.  Presumably on the hot 
> path we're saving/restoring ~4 fewer registers.   I haven't verified 
> that, but that is kindof the whole point here.

We save/restore just four registers total on the hot path.  And yes,
that is the point :-)

The hot exit is

.L683:
	ld 14,144(1)
	ld 15,152(1)
	ld 25,232(1)
	ld 30,272(1)
	addi 3,4,16
.L673:
	addi 1,1,288
	blr

so four GPR restores and no LR restore.  Without separate shrink-wrapping
this was

.L641:
	addi 3,21,16
	b .L631

[ ... ]

.L631:
	addi 1,1,288
	ld 29,16(1)
	ld 14,-144(1)
	ld 15,-136(1)
	ld 16,-128(1)
	ld 17,-120(1)
	ld 18,-112(1)
	ld 19,-104(1)
	ld 20,-96(1)
	ld 21,-88(1)
	ld 22,-80(1)
	ld 23,-72(1)
	ld 24,-64(1)
	mtlr 29
	ld 25,-56(1)
	ld 26,-48(1)
	ld 27,-40(1)
	ld 28,-32(1)
	ld 29,-24(1)
	ld 30,-16(1)
	ld 31,-8(1)
	blr

(18 GPRs as well as LR).

I didn't show this path because there is a whole bunch of branches with
inline asm in the way.

The sysmalloc path was

.L635:
	li 4,0
.L761:
	addi 1,1,288
	mr 3,14
	ld 14,16(1)
	ld 15,-136(1)
	ld 16,-128(1)
	ld 17,-120(1)
	ld 18,-112(1)
	ld 19,-104(1)
	ld 20,-96(1)
	ld 21,-88(1)
	ld 22,-80(1)
	ld 23,-72(1)
	ld 24,-64(1)
	ld 25,-56(1)
	mtlr 14
	ld 26,-48(1)
	ld 14,-144(1)
	ld 27,-40(1)
	ld 28,-32(1)
	ld 29,-24(1)
	ld 30,-16(1)
	ld 31,-8(1)
	b sysmalloc

and now is

.L677:
	mr 3,14
	ld 15,152(1)
	ld 14,144(1)
	ld 25,232(1)
	ld 30,272(1)
	li 4,0
	addi 1,1,288
	b sysmalloc

I attach malloc.s.{no,yes}, I hope you can stomach that.  Well you
can read HP-PA, heh.


Segher
-------------- next part --------------
A non-text attachment was scrubbed...
Name: malloc.s.no.gz
Type: application/x-gzip
Size: 40507 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20161010/81337273/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: malloc.s.yes.gz
Type: application/x-gzip
Size: 42479 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20161010/81337273/attachment-0001.bin>


More information about the Gcc-patches mailing list