This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ARM: Imply frame pointer for arm-linux profiling


On Wed, 2005-05-11 at 14:10, Daniel Jacobowitz wrote:
> What would you recommend?  The problem with the ip-based version is
> that it means you can't go through a PLT on the way to mcount - for
> this reason I suspect the netbsd-elf implementation is a little quirky. 
> I suppose we could require the runtime library implementation to
> provide an entry point for mcount which will not pass through the PLT,
> which does appropriate register saving.  Alternatively, force GCC to
> save/restore r3 if it is live on function entry.

Hmm, none of the existing solutions are really ABI compatible, since all
of them use IP in some way (generally to cache the caller's return
address).  With interworking (on V4T) that's unsafe even in a statically
linked environment.

It seems to me that we should just accept that IP & LR will get
clobbered and work from there.  The obvious solution is then

	.data
LP:
	.word 0
	.text
foo:
	push	{lr}
	bl	__gnu_mcount
	.4byte	LP - .
	// Normal code for foo (including normal prologue)

We can use the same sequence in both ARM and Thumb code.

There are a few things to note here:
1) We use .4byte because we can't assume that the entry will be aligned
in thumb state (it will always be on a half-word boundary, but we can't
guarantee a full word alignment).
2) The address of the count word is always stored PIC, even in non-pic
code.  That means we can profile both normal and PIC code in the same
manner. 
3) On entry into __gnu_mcount LR' (LR & ~1) will point to the following
instruction (the .4byte entry).  The address of the datum to update is
then simply "LR' + *(packed int *)LR'".
4) On return we pop the lr value pushed by the caller and return to LR+4
(even if Thumb).  That means we can stick the sequence above in front of
any ARM or Thumb function without altering the way we compile the rest
of it in any way.
5) __gnu_mcount has special abi privileges in that it does not take an
8-byte aligned stack.

So on ARMv4T, the __gnu_mcount code will look something like:

	push	{r0-r3, lr}
	tst	lr, #1
	bic	lr, lr, #1	// Clear thumb bit (if set)
	ldreq	r0, [lr]	// Caller was ARM state (aligned)
	ldrhne	r0, [lr]
	ldrhne	ip, [lr, #2]
	add	r0, r0, #lr
	addne	r0, r0, ip, asl #16
	ldr	r1, [sp, #24]	// Load caller's address
	bl	__gnu_mcount_1	//args: r0 = &count, r1  = caller
	pop	{r0-r3, ip, lr}	// Pops caller's address
	add	ip, ip, #4
	bx	ip

__gnu_mcount_1 can be written in C with substantially full ABI
privileges (though it may only touch core registers -- ie no floating
point).

The above sequence should work on all cores (even those that are pre-v4)
because the ldrh instructions will never execute in that case (and the
only cores that didn't have these instructions wouldn't fault them if
they didn't execute).

Of course, for big-endian, the order of the two ldrh instructions above
would need adjusting.

On a v6 core (with unaligned loads) we could simplify much of the above
code to get

	push	{r0-r3, lr}
	bic	lr, lr, #1	// Clear thumb bit (if set)
	ldr	r0, [lr]	// maybe unaligned.
	ldr	r1, [sp, #24]	// Load caller's address
	add	r0, r0, lr
	bl	__gnu_mcount_1	//args: r0 = &count, r1  = caller
	pop	{r0-r3, ip, lr}	// Pops caller's address
	add	ip, ip, #4
	bx	ip

R.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]