[PATCH] Better Function Prologue and Epilogue for sh

Joern Rennecke joern.rennecke@superh.com
Wed Jun 25 16:55:00 GMT 2003


"Nitin Gupta--SSW, Noida" wrote:
> The following patch generates registers saves and restores as register +
> offset instead of pre decrement and post increment to facilitate better
> scheduling of instructions.

I'm working right now on a patch to fix bugs in the prologue / epilogue code
with respect to the registers it assumes it can clobber.  I want to fix
these bugs first before bringing in new optimizations.
	
> As a workaround to this situation if the registers are saved on offset from
> r15 then we need not have a barrier mechanism in epilogue.
> 
> The code in Prologue will be something like
>       (barrier here)
>         add       #60,r15
>       mov.l     r14, (56,r15)
>       mov       r15,r14
>       (barrier here)
>       sts         pr,r1
>       mov.l      pr, @(52,r15)
>       mov.l      r13,@(48,r15)
>       mov.l      r12,@(44,r15)
>       mov.l      r11,@(40,r15)
>       mov.l      r10,@(36,r15)
>       mov.l      r9,@(32,r15)
>       mov.l      r8,@(28,r15)
> 
> and epilogue will be like
>       mov.l   @(56,r15),r7
>       lds     r7,pr
>       mov.l   @(48,r15),r13
>       mov.l   @(44,r15),r12
>       mov.l   @(40,r15),r11
>       mov.l   @(36,r15),r10
>       mov.l   @(32,r15),r9
>       mov.l   @(28,r15),r8
>       (barrier here)
>       mov.l   @(52,r15),r14
>       rts
>       add     #36,r15
That should be 60?

It's not that simple.  r0-r3 are used for return values, and r4-r7 are used
for exception handling data.  -fcall-saved-reg / -ffixed-reg can be used
to change the calling conventions.  With TARGET_HITACHI, macl and mach are
callee-saved, and you can't use register + offset addressing for these, either.
Nor can you for special registers saved in an interrupt.  You are changing
the place where the return address is (why?), and thus have to change sh_set_return_address and initial_elimination_offset too.
The delay slot of the rts instruction is a nice place for a memory instruction;
having the stack adjust there wastes an opportunity to put a load there.
lds is type CO - i.e. it can't be paired with anything - and using it to load
from a register takes just as long as loading from memory.  Likewise for sts.
Thus, splitting the pr save/restore effectively creates a new memory access
operation, which cannot be cheaper than sequestering one memory operation
behind a barrier.

Thus, you can end the epilogue like:

      add     #48,r15
      lds.l   @r15+,pr
      mov.l   @r15+,r9
      rts
      mov.l   @r15+,r8

Of course, changing the home of the return address again requires changes
to sh_set_return_address and initial_elimination_offset.
If registers other than general purpose registers have to be restores, they
can also go nicely into the rts delay slot and to hide the lds.l   @r15+,pr
latency.
		

But first, you should test if having a blockage instruction is really the
problem, or if the problem is that it's a universal blockage instruction.
What we really want is a stack memory blockage.  Try to make an insn pattern
that sets a BLKmode object to an unspec value; the memory reference for the
destination should be in alias set 0, and could be something that describes
and object that is addressed off the frame or stack pointer and as large as
the stack frame (basically, that *is* the stack frame).

-- 
--------------------------
SuperH (UK) Ltd.
2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
T:+44 1454 465658



More information about the Gcc-patches mailing list