[PATCH] Better Function Prologue and Epilogue for sh
Nitin Gupta--SSW, Noida
nitingup@noida.hcltech.com
Wed Jun 25 13:48:00 GMT 2003
The following patch generates registers saves and restores as register +
offset instead of pre decrement and post increment to facilitate better
scheduling of instructions.
The current implementation saves the callee save registers in a function in
this fashion in function prologue
mov.l r8,@-r15 /* CALEE SAVE REGISTERS ARE R8-R14 */
mov.l r9,@-r15
mov.l r10,@-r15
mov.l r11,@-r15
mov.l r12,@-r15
mov.l r13,@-r15
mov.l r14,@-r15
sts.l pr,@-r15 /* SAVING THE PR REGISTER */
and similarly in the epilogue we have the code for restoring the registers
lds.l @r15+,pr
mov.l @r15+,r14
mov.l @r15+,r13
mov.l @r15+,r12
mov.l @r15+,r11
mov.l @r15+,r10
mov.l @r15+,r9
rts
mov.l @r15+,r8
The problem is with the instruction scheduling. The scheduler cannot move
instructions in the epilogue since there is a barrier just at the start of
epilogue code. This is done to avoid moving the stack pointer adjustment
past code which reads from the local frame, else an interrupt could occur
after the SP adjustment and clobber data in the local frame. But this has a
limitation that instructions depend on the previous instructions and as a
result pipeline is disturbed. Each instruction in the epilogue depend on the
previous instruction because of the changing value of r15.
As a workaround to this situation if the registers are saved on offset from
r15 then we need not have a barrier mechanism in epilogue.
The code in Prologue will be something like
(barrier here)
add #60,r15
mov.l r14, (56,r15)
mov r15,r14
(barrier here)
sts pr,r1
mov.l pr, @(52,r15)
mov.l r13,@(48,r15)
mov.l r12,@(44,r15)
mov.l r11,@(40,r15)
mov.l r10,@(36,r15)
mov.l r9,@(32,r15)
mov.l r8,@(28,r15)
and epilogue will be like
mov.l @(56,r15),r7
lds r7,pr
mov.l @(48,r15),r13
mov.l @(44,r15),r12
mov.l @(40,r15),r11
mov.l @(36,r15),r10
mov.l @(32,r15),r9
mov.l @(28,r15),r8
(barrier here)
mov.l @(52,r15),r14
rts
add #36,r15
This implementation uses offsets from r15 for register save and restore.
Also restoring r14 is moved down as a result number of instruction in the
non-schedulable are reduces and thus ensuring better scheduling.
The only contention is that for saving and restoring PR register, it uses an
extra instruction due to absence of register + offset addressing mode for
sts instruction.
This patch performs optimization under following conditions.
1. TARGET_SH4
2. The total frame size is less than 60 bytes since offsets are only 8 bit.
3. More than two registers are pushed on the stack. This ensures that the
overhead of using extra instruction for PR is compensated by gains in
scheduling extra insns
4. Only GPRs R8-R14 are considered. Floating Point registers are saved in
auto increment/decrement mode.
The following patch has been tested on i686-pc-linux-gnu with a full "make
all", and regression tested with a top-level "make -k check" with no new
failures.
2003-06-16 Nitin Gupta <nitingup@noida.hcltech.com>
* config/sh/sh.c (expand_prologue, expand_epilogue) : Changes in
expand_prologue and expand_epilogue for
generation
of instructions to facilitate better
scheduling.
Thanks and Best Regards
Nitin Gupta
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PRO_EPI_PATCH
Type: application/octet-stream
Size: 13499 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20030625/96f1e31f/attachment.obj>
More information about the Gcc-patches
mailing list