[PATCH] Better Function Prologue and Epilogue for sh

Nitin Gupta--SSW, Noida nitingup@noida.hcltech.com
Wed Jun 25 13:48:00 GMT 2003


The following patch generates registers saves and restores as register +
offset instead of pre decrement and post increment to facilitate better
scheduling of instructions.
The current implementation saves the callee save registers in a function in
this fashion in function prologue

    mov.l r8,@-r15        /* CALEE SAVE REGISTERS ARE R8-R14 */
    mov.l r9,@-r15
    mov.l r10,@-r15
    mov.l r11,@-r15
    mov.l r12,@-r15
    mov.l r13,@-r15
    mov.l r14,@-r15
    sts.l  pr,@-r15        /* SAVING THE PR REGISTER */

and similarly in the epilogue we have the code for restoring the registers

        lds.l   @r15+,pr
        mov.l   @r15+,r14
        mov.l   @r15+,r13
        mov.l   @r15+,r12
        mov.l   @r15+,r11
        mov.l   @r15+,r10
        mov.l   @r15+,r9
        rts
        mov.l   @r15+,r8

The problem is with the  instruction scheduling. The scheduler cannot move
instructions in the epilogue since there is a barrier just at the start of
epilogue code. This is done to avoid moving the stack pointer adjustment
past code which reads from the local frame, else an interrupt could occur
after the SP adjustment and clobber data in the local frame. But this has a
limitation that instructions depend on the previous instructions and as a
result pipeline is disturbed. Each instruction in the epilogue depend on the
previous instruction because of the changing value of r15. 

As a workaround to this situation if the registers are saved on offset from
r15 then we need not have a barrier mechanism in epilogue.

The code in Prologue will be something like
      (barrier here)
        add       #60,r15
      mov.l     r14, (56,r15)
      mov       r15,r14
      (barrier here)
      sts         pr,r1
      mov.l      pr, @(52,r15)
      mov.l      r13,@(48,r15)
      mov.l      r12,@(44,r15)
      mov.l      r11,@(40,r15)
      mov.l      r10,@(36,r15)
      mov.l      r9,@(32,r15)
      mov.l      r8,@(28,r15)

and epilogue will be like
      mov.l   @(56,r15),r7
      lds     r7,pr
      mov.l   @(48,r15),r13
      mov.l   @(44,r15),r12
      mov.l   @(40,r15),r11
      mov.l   @(36,r15),r10
      mov.l   @(32,r15),r9
      mov.l   @(28,r15),r8
      (barrier here)
      mov.l   @(52,r15),r14
      rts
      add     #36,r15

This implementation uses offsets from r15 for register save and restore.
Also restoring r14 is moved down as a result number of instruction in the
non-schedulable are reduces and thus ensuring better scheduling.

The only contention is that for saving and restoring PR register, it uses an
extra instruction due to absence of register + offset addressing mode for
sts instruction.  
This  patch performs optimization under following conditions.

1. TARGET_SH4
2. The total frame size is less than 60 bytes since offsets are only 8 bit.
3. More than two registers are pushed on the stack. This ensures that the
overhead of using extra instruction for PR is compensated by gains in
scheduling extra insns 
4. Only GPRs R8-R14 are considered. Floating Point registers are saved in
auto increment/decrement mode.

The following patch has been tested on i686-pc-linux-gnu with a full "make
all", and regression tested with a top-level "make -k check" with no new
failures.



2003-06-16  Nitin Gupta  <nitingup@noida.hcltech.com>
 

        * config/sh/sh.c  (expand_prologue, expand_epilogue) : Changes in 
				expand_prologue and expand_epilogue for
generation 
				of instructions to facilitate better
scheduling.

 
Thanks and Best Regards
Nitin Gupta

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PRO_EPI_PATCH
Type: application/octet-stream
Size: 13499 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20030625/96f1e31f/attachment.obj>


More information about the Gcc-patches mailing list