[RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.

Sameera Deshpande sameera.deshpande@arm.com
Tue Nov 8 11:14:00 GMT 2011


On Fri, 2011-10-21 at 13:45 +0100, Ramana Radhakrishnan wrote: 
> >+arm_emit_strd_push (unsigned long saved_regs_mask)
> 
> How different is this from the thumb2 version you sent out in Patch 03/05 ?
> 
Thumb-2 STRD can handle non-consecutive registers, ARM STRD cannot.
Because of which we accumulate non-consecutive STRDs in ARM mode and
emit STM instruction. For consecutive registers, STRD is generated.

> >@@ -15958,7 +16081,8 @@ arm_get_frame_offsets (void)
> > 	     use 32-bit push/pop instructions.  */
> >  	  if (! any_sibcall_uses_r3 ()
> > 	      && arm_size_return_regs () <= 12
> >-	      && (offsets->saved_regs_mask & (1 << 3)) == 0)
> >+	      && (offsets->saved_regs_mask & (1 << 3)) == 0
> >+              && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd))
> 
> Not sure I completely follow this change yet.
> 
If the stack is not aligned, we need to adjust the stack in prologue.
Here, instead of adjusting the stack, we PUSH register R3 on stack, so
that no additional ADD instruction is needed for stack adjustment.
This works fine when we generate multi-reg load/store instructions.

However, when we generate STRD in ARM mode, non-consecutive registers
are stored using STR/STM instruction. As pair register of R3 (reg R2) is
never pushed on stack, we always end up generating STR instruction to
PUSH R3 on stack. This is more expensive than doing ADD SP, SP, #4 for
stack adjustment.

e.g. if we are PUSHing {R4, R5, R6} registers, the stack is not aligned,
hence, we PUSH {R3, R4, R5, R6}
So, Instructions generated are:
STR R6, [sp, #4]
STRD R4, R5, [sp, #12]
STR R3, [sp, #16]

However, if instead of R3, other caller-saved register is PUSHed,
we push {R4, R5, R6, R7}, to generate
STRD R6, R7, [sp, #8]
STRD R4, R5, [sp, #16]

If no caller saved register is available, we generate ADD instruction,
which is still better than generating STR. 
> 
> Hmmm the question remains if we want to put these into ldmstm.md since
> it was theoretically
> auto-generated from ldmstm.ml. If this has to be marked to be separate
> then I'd like
> to regenerate ldmstm.md from ldmstm.ml and differentiate between the
> bits that can be auto-generated
> and the bits that have been added since.
> 
The current patterns are quite different from patterns generated using
arm-ldmstm.ml. I will submit updated arm-ldmstm.ml file generating
ldrd/strd patterns as a new patch. Is that fine?

The patch is tested with check-gcc, check-gdb and bootstrap.

I see a regression in gcc:
FAIL: gcc.c-torture/execute/vector-compare-1.c compilation,  -O3
-fomit-frame-pointer -funroll-loops with error message 
/tmp/ccC13odV.s: Assembler messages:
/tmp/ccC13odV.s:544: Error: co-processor offset out of range

This seems to be uncovered latent bug, and I am looking into it.

- Thanks and regards,
  Sameera D.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a15_arm_strd_prologue-4Nov.patch
Type: text/x-patch
Size: 9284 bytes
Desc: a15_arm_strd_prologue-4Nov.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20111108/b8a7be39/attachment.bin>


More information about the Gcc-patches mailing list