This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: ARM: Imply frame pointer for arm-linux profiling
On Wed, 2005-05-11 at 14:10, Daniel Jacobowitz wrote:
> What would you recommend? The problem with the ip-based version is
> that it means you can't go through a PLT on the way to mcount - for
> this reason I suspect the netbsd-elf implementation is a little quirky.
> I suppose we could require the runtime library implementation to
> provide an entry point for mcount which will not pass through the PLT,
> which does appropriate register saving. Alternatively, force GCC to
> save/restore r3 if it is live on function entry.
Hmm, none of the existing solutions are really ABI compatible, since all
of them use IP in some way (generally to cache the caller's return
address). With interworking (on V4T) that's unsafe even in a statically
linked environment.
It seems to me that we should just accept that IP & LR will get
clobbered and work from there. The obvious solution is then
.data
LP:
.word 0
.text
foo:
push {lr}
bl __gnu_mcount
.4byte LP - .
// Normal code for foo (including normal prologue)
We can use the same sequence in both ARM and Thumb code.
There are a few things to note here:
1) We use .4byte because we can't assume that the entry will be aligned
in thumb state (it will always be on a half-word boundary, but we can't
guarantee a full word alignment).
2) The address of the count word is always stored PIC, even in non-pic
code. That means we can profile both normal and PIC code in the same
manner.
3) On entry into __gnu_mcount LR' (LR & ~1) will point to the following
instruction (the .4byte entry). The address of the datum to update is
then simply "LR' + *(packed int *)LR'".
4) On return we pop the lr value pushed by the caller and return to LR+4
(even if Thumb). That means we can stick the sequence above in front of
any ARM or Thumb function without altering the way we compile the rest
of it in any way.
5) __gnu_mcount has special abi privileges in that it does not take an
8-byte aligned stack.
So on ARMv4T, the __gnu_mcount code will look something like:
push {r0-r3, lr}
tst lr, #1
bic lr, lr, #1 // Clear thumb bit (if set)
ldreq r0, [lr] // Caller was ARM state (aligned)
ldrhne r0, [lr]
ldrhne ip, [lr, #2]
add r0, r0, #lr
addne r0, r0, ip, asl #16
ldr r1, [sp, #24] // Load caller's address
bl __gnu_mcount_1 //args: r0 = &count, r1 = caller
pop {r0-r3, ip, lr} // Pops caller's address
add ip, ip, #4
bx ip
__gnu_mcount_1 can be written in C with substantially full ABI
privileges (though it may only touch core registers -- ie no floating
point).
The above sequence should work on all cores (even those that are pre-v4)
because the ldrh instructions will never execute in that case (and the
only cores that didn't have these instructions wouldn't fault them if
they didn't execute).
Of course, for big-endian, the order of the two ldrh instructions above
would need adjusting.
On a v6 core (with unaligned loads) we could simplify much of the above
code to get
push {r0-r3, lr}
bic lr, lr, #1 // Clear thumb bit (if set)
ldr r0, [lr] // maybe unaligned.
ldr r1, [sp, #24] // Load caller's address
add r0, r0, lr
bl __gnu_mcount_1 //args: r0 = &count, r1 = caller
pop {r0-r3, ip, lr} // Pops caller's address
add ip, ip, #4
bx ip
R.