This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH, ARM] Low interrupt latency support (avoiding ldm/stm)

From: Julian Brown <julian at codesourcery dot com>
To: Paul Brook <paul at codesourcery dot com>
Cc: Mark Mitchell <mark at codesourcery dot com>, Phil Blundell <philb at gnu dot org>, gcc-patches at gcc dot gnu dot org, rearnsha at arm dot com
Date: Thu, 20 May 2010 13:19:45 +0100
Subject: Re: [PATCH, ARM] Low interrupt latency support (avoiding ldm/stm)
References: <20091127175025.7fb6ceae@rex.config> <201005171327.39030.paul@codesourcery.com> <4BF1E2D9.8030205@codesourcery.com> <201005181149.19071.paul@codesourcery.com>

On Tue, 18 May 2010 11:49:17 +0100
Paul Brook <paul@codesourcery.com> wrote:

> > > It appears the feature I was thinking of[1] was only standardised
> > > in ARMv6, and may not be present on earlier cores. It's
> > > definitely present on arm11 and cortex-a8/r4/m3 based cores.
> > 
> > OK, so Paul, where does that leave us?  If Phil is right that this
> > is in fact pretty universal (and not unique to Marvell Feroceon for
> > which this patch was originally developed, IIRC), do you still
> > object to it going in?  (I have no opinion the name; if you prefer
> > -mavoid-ldm, that seems OK, though of course it also applies to
> > STM.)
> 
> I think the documentation needs improving. i.e. explain when it
> should be used and, more importantly, when it provides no benefit
> (Cortex-M[34], most armv6+ cores in low latency mode). Other than
> that it looks ok.  I don't care enough to argue about the name.

Here's a new version of the patch. I've renamed the option "-mmultiple"
for slight consistency with other targets and reversed the sense (i.e.
it is enabled by default, and you should use "-mno-multiple" to turn
off ldm/stm instructions). I've also improved the documentation
somewhat.

I'm not entirely sure still whether applying this patch is a good idea
though. There's a lot of potential for bit-rot to creep in (i.e.
introduction of ldm/stm or fldm/fstm instructions without checking the
use_load_store_multiple flag). OTOH I believe ARM's own compiler has
supported a similar option since prehistoric times, so maybe there's
demand.

Richard, what do you think?

(Re-tested OK, though without the new option.)

Julian

    Vladimir Prus  <vladimir@codesourcery.com>
    Julian Brown  <julian@codesourcery.com>

    gcc/
    * config/arm/arm.c (arm_override_options): Warn if mno-multiple
    is specified in Thumb mode.
    (load_multiple_sequence): Return 0 if use_load_store_multiple is
    false.
    (store_multiple_sequence): Likewise.
    (arm_gen_load_multiple): Load registers one-by-one if
    use_load_store_multiple is false.
    (arm_gen_store_multiple): Likewise.
    (vfp_output_fldmd): When use_load_store_multiple is false, pop each
    register separately.
    (vfp_emit_fstmd): When use_load_store_multiple is false, save each
    register separately.
    (arm_get_vfp_saved_size): Adjust saved register size calculation for
    the above changes.
    (print_pop_reg_by_ldr): New.
    (arm_output_epilogue): Use print_pop_reg_by_ldr when
    use_load_store_multiple is false.
    (emit_multi_reg_push): Push registers separately if
    use_load_store_multiple is false.
    * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Set
    __use_load_store_multiple__.
    (use_load_store_multiple): Define.
    (USE_RETURN_INSN): Only use return insn if use_load_store_multiple
    is true.
    * config/arm/lib1funcs.asm (do_pop, do_push): Define as variadic
    macros. When __use_load_store_multiple__ is not defined, push and
    pop registers individually.
    (div0): Use correct punctuation.
    * config/arm/ieee754-df.S: Adjust syntax of using do_push.
    * config/arm/ieee754-sf.S: Likewise.
    * config/arm/bpabi.S: Likewise.
    * config/arm/arm.opt (mmultiple, mno-multiple): New options.
    * config/arm/predicates.md (load_multiple_operation): Return false
    if use_load_store_multiple is false.
    (store_multiple_operation): Likewise.
    * config/arm/arm.md (movmemqi, *arith_adjacentmem): Only use if
    use_load_store_multiple is true.
    * doc/invoke.texi (-m[no-]multiple): Add documentation.

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 159030)
+++ gcc/doc/invoke.texi	(working copy)
@@ -465,6 +465,7 @@ Objective-C and Objective-C++ Dialects}.
 -mtpcs-frame  -mtpcs-leaf-frame @gol
 -mcaller-super-interworking  -mcallee-super-interworking @gol
 -mtp=@var{name} @gol
+-mmultiple -mno-multiple @gol
 -mword-relocations @gol
 -mfix-cortex-m3-ldrd}
 
@@ -9991,6 +9992,14 @@ long_calls_off} directive.  Note these s
 the compiler generates code to handle function calls via function
 pointers.
 
+@item -mmultiple
+@itemx -mno-multiple
+@opindex mmultiple
+@opindex mno-multiple
+Generate load-multiple (@code{ldm}) and store-multiple (@code{stm}) instructions (enabled by default).  Use the @option{-mno-multiple} if your application has hard real-time constraints for handling interrupts, and the processor you are targeting is unable to interrupt @code{ldm} and @code{stm} instructions mid-way through execution.  A series of single-register loads/stores will be used instead.  Note that for best effect, your runtime libraries must also be compiled with the @option{-mno-multiple} option.
+
+Use of the @option{-mno-multiple} option will result in code which is both larger and slower in general than code compiled without the option.  This option should not be used on ARM processors using the ARMv6 architecture version or newer, since such processors support a low-latency mode which should be used in preference.  The option has no effect in Thumb mode.
+
 @item -msingle-pic-base
 @opindex msingle-pic-base
 Treat the register used for PIC addressing as read-only, rather than
Index: gcc/config/arm/ieee754-df.S
===================================================================
--- gcc/config/arm/ieee754-df.S	(revision 159030)
+++ gcc/config/arm/ieee754-df.S	(working copy)
@@ -83,7 +83,7 @@ ARM_FUNC_ALIAS aeabi_dsub subdf3
 ARM_FUNC_START adddf3
 ARM_FUNC_ALIAS aeabi_dadd adddf3
 
-1:	do_push	{r4, r5, lr}
+1:	do_push	(r4, r5, lr)
 
 	@ Look for zeroes, equal values, INF, or NAN.
 	shift1	lsl, r4, xh, #1
@@ -427,7 +427,7 @@ ARM_FUNC_ALIAS aeabi_ui2d floatunsidf
 	do_it	eq, t
 	moveq	r1, #0
 	RETc(eq)
-	do_push	{r4, r5, lr}
+	do_push	(r4, r5, lr)
 	mov	r4, #0x400		@ initial exponent
 	add	r4, r4, #(52-1 - 1)
 	mov	r5, #0			@ sign bit is 0
@@ -447,7 +447,7 @@ ARM_FUNC_ALIAS aeabi_i2d floatsidf
 	do_it	eq, t
 	moveq	r1, #0
 	RETc(eq)
-	do_push	{r4, r5, lr}
+	do_push	(r4, r5, lr)
 	mov	r4, #0x400		@ initial exponent
 	add	r4, r4, #(52-1 - 1)
 	ands	r5, r0, #0x80000000	@ sign bit in r5
@@ -481,7 +481,7 @@ ARM_FUNC_ALIAS aeabi_f2d extendsfdf2
 	RETc(eq)			@ we are done already.
 
 	@ value was denormalized.  We can normalize it now.
-	do_push	{r4, r5, lr}
+	do_push	(r4, r5, lr)
 	mov	r4, #0x380		@ setup corresponding exponent
 	and	r5, xh, #0x80000000	@ move sign bit in r5
 	bic	xh, xh, #0x80000000
@@ -508,9 +508,9 @@ ARM_FUNC_ALIAS aeabi_ul2d floatundidf
 	@ compatibility.
 	adr	ip, LSYM(f0_ret)
 	@ Push pc as well so that RETLDM works correctly.
-	do_push	{r4, r5, ip, lr, pc}
+	do_push	(r4, r5, ip, lr, pc)
 #else
-	do_push	{r4, r5, lr}
+	do_push	(r4, r5, lr)
 #endif
 
 	mov	r5, #0
@@ -534,9 +534,9 @@ ARM_FUNC_ALIAS aeabi_l2d floatdidf
 	@ compatibility.
 	adr	ip, LSYM(f0_ret)
 	@ Push pc as well so that RETLDM works correctly.
-	do_push	{r4, r5, ip, lr, pc}
+	do_push	(r4, r5, ip, lr, pc)
 #else
-	do_push	{r4, r5, lr}
+	do_push	(r4, r5, lr)
 #endif
 
 	ands	r5, ah, #0x80000000	@ sign bit in r5
@@ -585,7 +585,7 @@ ARM_FUNC_ALIAS aeabi_l2d floatdidf
 	@ Legacy code expects the result to be returned in f0.  Copy it
 	@ there as well.
 LSYM(f0_ret):
-	do_push	{r0, r1}
+	do_push	(r0, r1)
 	ldfd	f0, [sp], #8
 	RETLDM
 
@@ -602,7 +602,7 @@ LSYM(f0_ret):
 
 ARM_FUNC_START muldf3
 ARM_FUNC_ALIAS aeabi_dmul muldf3
-	do_push	{r4, r5, r6, lr}
+	do_push	(r4, r5, r6, lr)
 
 	@ Mask out exponents, trap any zero/denormal/INF/NAN.
 	mov	ip, #0xff
@@ -910,7 +910,7 @@ LSYM(Lml_n):
 ARM_FUNC_START divdf3
 ARM_FUNC_ALIAS aeabi_ddiv divdf3
 	
-	do_push	{r4, r5, r6, lr}
+	do_push	(r4, r5, r6, lr)
 
 	@ Mask out exponents, trap any zero/denormal/INF/NAN.
 	mov	ip, #0xff
@@ -1195,7 +1195,7 @@ ARM_FUNC_ALIAS aeabi_cdcmple aeabi_cdcmp
 
 	@ The status-returning routines are required to preserve all
 	@ registers except ip, lr, and cpsr.
-6:	do_push	{r0, lr}
+6:	do_push	(r0, lr)
 	ARM_CALL cmpdf2
 	@ Set the Z flag correctly, and the C flag unconditionally.
 	cmp	r0, #0
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 159030)
+++ gcc/config/arm/arm.c	(working copy)
@@ -1897,6 +1897,13 @@ arm_override_options (void)
 
   /* Register global variables with the garbage collector.  */
   arm_add_gc_roots ();
+
+  if (!use_load_store_multiple && TARGET_THUMB)
+    {
+      warning (0, 
+	       "-mno-multiple has no effect when compiling for Thumb");
+      use_load_store_multiple = 1;
+    }
 }
 
 static void
@@ -9083,6 +9090,9 @@ load_multiple_sequence (rtx *operands, i
   int base_reg = -1;
   int i;
 
+  if (!use_load_store_multiple)
+    return 0;
+
   /* Can only handle 2, 3, or 4 insns at present,
      though could be easily extended if required.  */
   gcc_assert (nops >= 2 && nops <= 4);
@@ -9312,6 +9322,9 @@ store_multiple_sequence (rtx *operands, 
   int base_reg = -1;
   int i;
 
+  if (!use_load_store_multiple)
+    return 0;
+
   /* Can only handle 2, 3, or 4 insns at present, though could be easily
      extended if required.  */
   gcc_assert (nops >= 2 && nops <= 4);
@@ -9519,7 +9532,8 @@ arm_gen_load_multiple (int base_regno, i
 
      As a compromise, we use ldr for counts of 1 or 2 regs, and ldm
      for counts of 3 or 4 regs.  */
-  if (arm_tune_xscale && count <= 2 && ! optimize_size)
+  if (!use_load_store_multiple
+      || (arm_tune_xscale && count <= 2 && ! optimize_size))
     {
       rtx seq;
 
@@ -9582,7 +9596,8 @@ arm_gen_store_multiple (int base_regno, 
 
   /* See arm_gen_load_multiple for discussion of
      the pros/cons of ldm/stm usage for XScale.  */
-  if (arm_tune_xscale && count <= 2 && ! optimize_size)
+  if (!use_load_store_multiple
+      || (arm_tune_xscale && count <= 2 && ! optimize_size))
     {
       rtx seq;
 
@@ -11719,6 +11734,20 @@ static void
 vfp_output_fldmd (FILE * stream, unsigned int base, int reg, int count)
 {
   int i;
+  int offset;
+
+  if (!use_load_store_multiple)
+    {
+      /* Output a sequence of FLDD instructions.  */
+      offset = 0;
+      for (i = reg; i < reg + count; ++i, offset += 8)
+	{
+	  fputc ('\t', stream);
+	  asm_fprintf (stream, "fldd\td%d, [%r,#%d]\n", i, base, offset);
+	}
+      asm_fprintf (stream, "\tadd\tsp, sp, #%d\n", count * 8);
+      return;
+    }
 
   /* Workaround ARM10 VFPr1 bug.  */
   if (count == 2 && !arm_arch6)
@@ -11789,6 +11818,56 @@ vfp_emit_fstmd (int base_reg, int count)
   rtx tmp, reg;
   int i;
 
+  if (!use_load_store_multiple)
+    {
+      int saved_size;
+      rtx sp_insn;
+
+      if (!count)
+	return 0;
+
+      saved_size = count * GET_MODE_SIZE (DFmode);
+
+      /* Since fstd does not have postdecrement addressing mode,
+	 we first decrement stack pointer and then use base+offset
+	 stores for VFP registers. The ARM EABI unwind information 
+	 can't easily describe base+offset loads, so we attach
+	 a note for the effects of the whole block in the first insn, 
+	 and  avoid marking the subsequent instructions 
+	 with RTX_FRAME_RELATED_P.  */
+      sp_insn = gen_addsi3 (stack_pointer_rtx, stack_pointer_rtx,
+			    GEN_INT (-saved_size));
+      sp_insn = emit_insn (sp_insn);
+      RTX_FRAME_RELATED_P (sp_insn) = 1;
+
+      dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (count + 1));
+      XVECEXP (dwarf, 0, 0) = 
+	gen_rtx_SET (VOIDmode, stack_pointer_rtx,
+		     plus_constant (stack_pointer_rtx, -saved_size));
+      
+      /* push double VFP registers to stack */
+      for (i = 0; i < count; ++i )
+	{
+	  rtx reg;
+	  rtx mem;
+	  rtx addr;
+	  rtx insn;
+	  reg = gen_rtx_REG (DFmode, base_reg + 2*i);
+	  addr = (i == 0) ? stack_pointer_rtx
+	    : gen_rtx_PLUS (SImode, stack_pointer_rtx,
+			    GEN_INT (i * GET_MODE_SIZE (DFmode)));
+	  mem = gen_frame_mem (DFmode, addr);
+	  insn = emit_move_insn (mem, reg);
+	  XVECEXP (dwarf, 0, i+1) = 
+	    gen_rtx_SET (VOIDmode, mem, reg);
+	}
+
+      REG_NOTES (sp_insn) = gen_rtx_EXPR_LIST (REG_FRAME_RELATED_EXPR, dwarf,
+					       REG_NOTES (sp_insn));
+      
+      return saved_size;
+    }
+
   /* Workaround ARM10 VFPr1 bug.  Data corruption can occur when exactly two
      register pairs are stored by a store multiple insn.  We avoid this
      by pushing an extra pair.  */
@@ -13231,7 +13310,7 @@ arm_get_vfp_saved_size (void)
 	      if (count > 0)
 		{
 		  /* Workaround ARM10 VFPr1 bug.  */
-		  if (count == 2 && !arm_arch6)
+		  if (count == 2 && !arm_arch6 && use_load_store_multiple)
 		    count++;
 		  saved += count * 8;
 		}
@@ -13569,6 +13648,41 @@ arm_output_function_prologue (FILE *f, H
 
 }
 
+/* Generate to STREAM a code sequence that pops registers identified 
+   in REGS_MASK from SP. SP is incremented as the result.
+*/
+static void
+print_pop_reg_by_ldr (FILE *stream, int regs_mask, int rfe)
+{
+  int reg;
+
+  gcc_assert (! (regs_mask & (1 << SP_REGNUM)));
+  
+  for (reg = 0; reg < PC_REGNUM; ++reg)
+    if (regs_mask & (1 << reg))
+      asm_fprintf (stream, "\tldr\t%r, [%r], #4\n",
+		   reg, SP_REGNUM); 
+
+  if (regs_mask & (1 << PC_REGNUM))
+    {
+      if (rfe)
+	/* When returning from exception, we need to
+	   copy SPSR to CPSR.  There are two ways to do
+	   that: the ldm instruction with "^" suffix,
+	   and movs instruction.  The latter would
+	   require that we load from stack to some
+	   scratch register, and then move to PC.
+	   Therefore, we'd need extra instruction and
+	   have to make sure we actually have a spare
+	   register.  Using ldm with a single register
+	   is simler.  */
+	asm_fprintf (stream, "\tldm\tsp!, {pc}^\n");
+      else
+	asm_fprintf (stream, "\tldr\t%r, [%r], #4\n",
+		     PC_REGNUM, SP_REGNUM); 
+    }
+}
+
 const char *
 arm_output_epilogue (rtx sibling)
 {
@@ -13948,18 +14062,22 @@ arm_output_epilogue (rtx sibling)
 	}
       else if (saved_regs_mask)
 	{
-	  if (saved_regs_mask & (1 << SP_REGNUM))
-	    /* Note - write back to the stack register is not enabled
-	       (i.e. "ldmfd sp!...").  We know that the stack pointer is
-	       in the list of registers and if we add writeback the
-	       instruction becomes UNPREDICTABLE.  */
-	    print_multi_reg (f, "ldmfd\t%r, ", SP_REGNUM, saved_regs_mask,
-			     rfe);
-	  else if (TARGET_ARM)
-	    print_multi_reg (f, "ldmfd\t%r!, ", SP_REGNUM, saved_regs_mask,
-			     rfe);
+	  gcc_assert (! (saved_regs_mask & (1 << SP_REGNUM)));
+	  if (TARGET_ARM)
+	    {
+	      if (use_load_store_multiple)
+		print_multi_reg (f, "ldmfd\t%r!, ", SP_REGNUM, saved_regs_mask,
+				 rfe);
+	      else
+		print_pop_reg_by_ldr (f, saved_regs_mask, rfe);
+	    }
 	  else
-	    print_multi_reg (f, "pop\t", SP_REGNUM, saved_regs_mask, 0);
+	    {
+	      if (use_load_store_multiple)
+	        print_multi_reg (f, "pop\t", SP_REGNUM, saved_regs_mask, 0);
+	      else
+		print_pop_reg_by_ldr (f, saved_regs_mask, 0);
+	    }
 	}
 
       if (crtl->args.pretend_args_size)
@@ -14078,6 +14196,31 @@ emit_multi_reg_push (unsigned long mask)
 
   gcc_assert (num_regs && num_regs <= 16);
 
+  if (!use_load_store_multiple)
+    {
+      rtx insn = 0;
+
+      /* Emit a series of ldr instructions rather rather than a single ldm.  */
+      /* TODO: Use ldrd where possible.  */
+      gcc_assert (! (mask & (1 << SP_REGNUM)));
+
+      for (i = LAST_ARM_REGNUM; i >= 0; --i)
+        {
+          if (mask & (1 << i))
+            {
+              rtx reg, where, mem;
+
+	      reg = gen_rtx_REG (SImode, i);
+	      where = gen_rtx_PRE_DEC (SImode, stack_pointer_rtx);
+	      mem = gen_rtx_MEM (SImode, where);
+	      insn = emit_move_insn (mem, reg);
+	      RTX_FRAME_RELATED_P (insn) = 1;
+            }
+        }
+
+      return insn;
+    }
+
   /* We don't record the PC in the dwarf frame information.  */
   num_dwarf_regs = num_regs;
   if (mask & (1 << PC_REGNUM))
Index: gcc/config/arm/lib1funcs.asm
===================================================================
--- gcc/config/arm/lib1funcs.asm	(revision 159030)
+++ gcc/config/arm/lib1funcs.asm	(working copy)
@@ -254,8 +254,8 @@ LSYM(Lend_fde):
 .macro shift1 op, arg0, arg1, arg2
 	\op	\arg0, \arg1, \arg2
 .endm
-#define do_push	push
-#define do_pop	pop
+#define do_push(...)	push {__VA_ARGS__}
+#define do_pop(...)	pop {__VA_ARGS__}
 #define COND(op1, op2, cond) op1 ## op2 ## cond
 /* Perform an arithmetic operation with a variable shift operand.  This
    requires two instructions and a scratch register on Thumb-2.  */
@@ -269,8 +269,42 @@ LSYM(Lend_fde):
 .macro shift1 op, arg0, arg1, arg2
 	mov	\arg0, \arg1, \op \arg2
 .endm
-#define do_push	stmfd sp!,
-#define do_pop	ldmfd sp!,
+#if !defined(__use_load_store_multiple__)
+#define do_push(...) \
+  _buildN1(do_push, _buildC1(__VA_ARGS__))( __VA_ARGS__)
+#define _buildN1(BASE, X)	_buildN2(BASE, X)
+#define _buildN2(BASE, X)	BASE##X
+#define _buildC1(...)		_buildC2(__VA_ARGS__,9,8,7,6,5,4,3,2,1)
+#define _buildC2(a1,a2,a3,a4,a5,a6,a7,a8,a9,c,...) c
+        
+#define do_push1(r1) str r1, [sp, #-4]!
+#define do_push2(r1, r2) str r2, [sp, #-4]! ; str r1, [sp, #-4]!
+#define do_push3(r1, r2, r3) str r3, [sp, #-4]! ; str r2, [sp, #-4]!; str r1, [sp, #-4]!
+#define do_push4(r1, r2, r3, r4) \
+        do_push3 (r2, r3, r4);\
+        do_push1 (r1)
+#define do_push5(r1, r2, r3, r4, r5) \
+        do_push4 (r2, r3, r4, r5);\
+        do_push1 (r1)
+        
+#define do_pop(...) \
+_buildN1(do_pop, _buildC1(__VA_ARGS__))( __VA_ARGS__)
+        
+#define do_pop1(r1) ldr r1, [sp], #4
+#define do_pop2(r1, r2) ldr r1, [sp], #4 ; ldr r2, [sp], #4
+#define do_pop3(r1, r2, r3) ldr r1, [sp], #4 ; str r2, [sp], #4; str r3, [sp], #4
+#define do_pop4(r1, r2, r3, r4) \
+        do_pop1 (r1);\
+        do_pop3 (r2, r3, r4)
+#define do_pop5(r1, r2, r3, r4, r5) \
+        do_pop1 (r1);\
+        do_pop4 (r2, r3, r4, r5)
+#else
+#define do_push(...)    stmfd sp!, { __VA_ARGS__}
+#define do_pop(...)     ldmfd sp!, {__VA_ARGS__}
+#endif
+
+        
 #define COND(op1, op2, cond) op1 ## cond ## op2
 .macro shiftop name, dest, src1, src2, shiftop, shiftreg, tmp
 	\name \dest, \src1, \src2, \shiftop \shiftreg
@@ -1260,7 +1294,7 @@ LSYM(Lover12):
 	ARM_FUNC_START div0
 #endif
 
-	do_push	{r1, lr}
+	do_push	(r1, lr)
 	mov	r0, #SIGFPE
 	bl	SYM(raise) __PLT__
 	RETLDM	r1
@@ -1277,7 +1311,7 @@ LSYM(Lover12):
 #if defined __ARM_EABI__ && defined __linux__
 @ EABI GNU/Linux call to cacheflush syscall.
 	ARM_FUNC_START clear_cache
-	do_push	{r7}
+	do_push	(r7)
 #if __ARM_ARCH__ >= 7 || defined(__ARM_ARCH_6T2__)
 	movw	r7, #2
 	movt	r7, #0xf
@@ -1287,7 +1321,7 @@ LSYM(Lover12):
 #endif
 	mov	r2, #0
 	swi	0
-	do_pop	{r7}
+	do_pop	(r7)
 	RET
 	FUNC_END clear_cache
 #else
@@ -1490,7 +1524,7 @@ FUNC_START clzdi2
 	push	{r4, lr}
 # else
 ARM_FUNC_START clzdi2
-	do_push	{r4, lr}
+	do_push	(r4, lr)
 # endif
 	cmp	xxh, #0
 	bne	1f
Index: gcc/config/arm/arm.h
===================================================================
--- gcc/config/arm/arm.h	(revision 159030)
+++ gcc/config/arm/arm.h	(working copy)
@@ -95,6 +95,8 @@ extern char arm_arch_name[];
 	  builtin_define ("__IWMMXT__");		\
 	if (TARGET_AAPCS_BASED)				\
 	  builtin_define ("__ARM_EABI__");		\
+	if (use_load_store_multiple)			\
+	  builtin_define ("__use_load_store_multiple__");	\
     } while (0)
 
 /* The various ARM cores.  */
@@ -443,6 +445,11 @@ extern int arm_arch_thumb2;
 /* Nonzero if chip supports integer division instruction.  */
 extern int arm_arch_hwdiv;
 
+/* Nonzero if we use load/store multiple instructions (on by default).  Zero to
+   attempt to improve interrupt latency for generated code by avoiding ldm/stm,
+   at the expense of code size and performance.  */
+extern int use_load_store_multiple;
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
@@ -1813,11 +1820,12 @@ typedef struct
 #define EPILOGUE_USES(REGNO) ((REGNO) == LR_REGNUM)
 
 /* Determine if the epilogue should be output as RTL.
-   You should override this if you define FUNCTION_EXTRA_EPILOGUE.  */
-/* This is disabled for Thumb-2 because it will confuse the
-   conditional insn counter.  */
+   You should override this if you define FUNCTION_EXTRA_EPILOGUE.
+   This is disabled for Thumb-2 because it will confuse the
+   conditional insn counter.
+   Do not use a return insn if we're avoiding ldm/stm instructions.  */
 #define USE_RETURN_INSN(ISCOND)				\
-  (TARGET_ARM ? use_return_insn (ISCOND, NULL) : 0)
+  ((TARGET_ARM && use_load_store_multiple) ? use_return_insn (ISCOND, NULL) : 0)
 
 /* Definitions for register eliminations.
 
Index: gcc/config/arm/bpabi.S
===================================================================
--- gcc/config/arm/bpabi.S	(revision 159030)
+++ gcc/config/arm/bpabi.S	(working copy)
@@ -126,16 +126,17 @@ ARM_FUNC_START aeabi_ldivmod
 	test_div_by_zero signed
 
 	sub sp, sp, #8
-#if defined(__thumb2__)
+/* Low latency and Thumb-2 do_push implementations can't push sp directly.  */
+#if defined(__thumb2__) || !defined(__use_load_store_multiple__)
 	mov ip, sp
-	push {ip, lr}
+	do_push (ip, lr)
 #else
-	do_push {sp, lr}
+	stmfd sp!, {sp, lr}
 #endif
 	bl SYM(__gnu_ldivmod_helper) __PLT__
 	ldr lr, [sp, #4]
 	add sp, sp, #8
-	do_pop {r2, r3}
+	do_pop (r2, r3)
 	RET
 	
 #endif /* L_aeabi_ldivmod */
@@ -146,16 +147,17 @@ ARM_FUNC_START aeabi_uldivmod
 	test_div_by_zero unsigned
 
 	sub sp, sp, #8
-#if defined(__thumb2__)
+/* Low latency and Thumb-2 do_push implementations can't push sp directly.  */
+#if defined(__thumb2__) || !defined(__use_load_store_multiple__)
 	mov ip, sp
-	push {ip, lr}
+	do_push (ip, lr)
 #else
-	do_push {sp, lr}
+	stmfd sp!, {sp, lr}
 #endif
 	bl SYM(__gnu_uldivmod_helper) __PLT__
 	ldr lr, [sp, #4]
 	add sp, sp, #8
-	do_pop {r2, r3}
+	do_pop (r2, r3)
 	RET
 	
 #endif /* L_aeabi_divmod */
Index: gcc/config/arm/ieee754-sf.S
===================================================================
--- gcc/config/arm/ieee754-sf.S	(revision 159030)
+++ gcc/config/arm/ieee754-sf.S	(working copy)
@@ -481,7 +481,7 @@ LSYM(Lml_x):
 	and	r3, ip, #0x80000000
 
 	@ Well, no way to make it shorter without the umull instruction.
-	do_push	{r3, r4, r5}
+	do_push	(r3, r4, r5)
 	mov	r4, r0, lsr #16
 	mov	r5, r1, lsr #16
 	bic	r0, r0, r4, lsl #16
@@ -492,7 +492,7 @@ LSYM(Lml_x):
 	mla	r0, r4, r1, r0
 	adds	r3, r3, r0, lsl #16
 	adc	r1, ip, r0, lsr #16
-	do_pop	{r0, r4, r5}
+	do_pop	(r0, r4, r5)
 
 #else
 
@@ -882,7 +882,7 @@ ARM_FUNC_ALIAS aeabi_cfcmple aeabi_cfcmp
 
 	@ The status-returning routines are required to preserve all
 	@ registers except ip, lr, and cpsr.
-6:	do_push	{r0, r1, r2, r3, lr}
+6:	do_push	(r0, r1, r2, r3, lr)
 	ARM_CALL cmpsf2
 	@ Set the Z flag correctly, and the C flag unconditionally.
 	cmp	r0, #0
Index: gcc/config/arm/arm.opt
===================================================================
--- gcc/config/arm/arm.opt	(revision 159030)
+++ gcc/config/arm/arm.opt	(working copy)
@@ -161,6 +161,10 @@ mvectorize-with-neon-quad
 Target Report Mask(NEON_VECTORIZE_QUAD)
 Use Neon quad-word (rather than double-word) registers for vectorization
 
+mmultiple
+Target Report Var(use_load_store_multiple) Init(1)
+Generate load and store multiple instructions
+
 mword-relocations
 Target Report Var(target_word_relocations) Init(TARGET_DEFAULT_WORD_RELOCATIONS)
 Only generate absolute relocations on word sized values.
Index: gcc/config/arm/predicates.md
===================================================================
--- gcc/config/arm/predicates.md	(revision 159030)
+++ gcc/config/arm/predicates.md	(working copy)
@@ -310,6 +310,9 @@
   HOST_WIDE_INT i = 1, base = 0;
   rtx elt;
 
+  if (!use_load_store_multiple)
+    return false;
+
   if (count <= 1
       || GET_CODE (XVECEXP (op, 0, 0)) != SET)
     return false;
@@ -367,6 +370,9 @@
   HOST_WIDE_INT i = 1, base = 0;
   rtx elt;
 
+  if (!use_load_store_multiple)
+    return false;
+
   if (count <= 1
       || GET_CODE (XVECEXP (op, 0, 0)) != SET)
     return false;
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 159030)
+++ gcc/config/arm/arm.md	(working copy)
@@ -6546,7 +6546,7 @@
    (match_operand:BLK 1 "general_operand" "")
    (match_operand:SI 2 "const_int_operand" "")
    (match_operand:SI 3 "const_int_operand" "")]
-  "TARGET_EITHER"
+  "TARGET_EITHER && use_load_store_multiple"
   "
   if (TARGET_32BIT)
     {
@@ -10348,7 +10348,8 @@
 	 [(match_operand:SI 2 "memory_operand" "m")
 	  (match_operand:SI 3 "memory_operand" "m")]))
    (clobber (match_scratch:SI 4 "=r"))]
-  "TARGET_ARM && adjacent_mem_locations (operands[2], operands[3])"
+  "TARGET_ARM && adjacent_mem_locations (operands[2], operands[3])
+   && use_load_store_multiple"
   "*
   {
     rtx ldm[3];

Follow-Ups:
- Re: [PATCH, ARM] Low interrupt latency support (avoiding ldm/stm)
  - From: Mark Mitchell

References:
- Re: [PATCH, ARM] Low interrupt latency support (avoiding ldm/stm)
  - From: Paul Brook
- Re: [PATCH, ARM] Low interrupt latency support (avoiding ldm/stm)
  - From: Mark Mitchell
- Re: [PATCH, ARM] Low interrupt latency support (avoiding ldm/stm)
  - From: Paul Brook

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]