This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Don't use FLDMX and FSTMX on ARMv6+


The FLDMX and FSTMX load/store-multiple insns are deprecated in favour of
FLDMD and FSTMD respectively on ARMv6 and above.  This patch causes
gcc to use the latter instructions instead of FLDMX and FSTMX when
emitting code for such architectures.

FLDMD and FSTMD correspond to a different set of unwind opcodes from
FLDMX and FSTMX.  I have already had approved and committed a binutils
patch [1] that provides a new ".vsave" directive instructing the
assembler to emit unwind opcodes to correspond to an FSTMD instruction.
As a consequence, an ARM compiler with the attached patch requires also
that binutils patch in the toolchain for successful emission of code
that involves the generation of unwind information for VFP registers.
The unwind opcodes for the `D' instructions also provide for VFPv3
registers, which this patch also supports.

Given this, I wonder if a configure check should be added to test for
the presence of the updated assembler -- and if that fails, then gcc
should emit FLDMX and FSTMX as before.

This patch also fixes a number of off-by-one errors in pr-support.c,
as noted below.  I've combined this with the other additions since I
have made other modifications around the same places.

Regression tested with an arm-none-linux-gnueabi target, with the
exception of course of the VFPv3 support in this patch.  The patch has
also been manually tested to check that stack unwinding is working
correctly.

OK for mainline, once open?

Mark

[1] See revision 1.2943 of src/gas/ChangeLog.

--

gcc/ChangeLog:

2006-06-21 Mark Shinwell <shinwell@codesourcery.com>

	* config/arm/arm.c (arm_output_fldmx): Output FLDMD instead of FLDMX
	if compiling for ARMv6 or later.
	(vfp_output_fstmx): Output FSTMD instead of FSTMX if compiling for
	ARMv6 or later.
	(vfp_emit_fstmx): Don't leave space in the frame layout for the
	FSTMX format word if compiling for ARMv6 or later.
	(arm_get_vfp_saved_size): Don't add in space for the FSTMX format word
	if compiling for ARMv6 or later.
	(arm_output_epilogue): Adjust comment to reflect use of FSTMD.
	(arm_unwind_emit_sequence): Don't compensate for the FSTMX format
	word if compiling for ARMv6 or later.  Also emit ".vsave" assembler
	directive in such cases rather than ".save".
	* config/arm/libunwind.S (gnu_Unwind_Restore_VFP,
	gnu_Unwind_Save_VFP): Adjust comments.
	(gnu_Unwind_Restore_VFP_D, gnu_Unwind_Save_VFP_D): New functions
	for saving and restoring using FSTMD and FLDMD, rather than
	FSTMX and FLDMX.
	(gnu_Unwind_Restore_VFP_D_16_to_31, gnu_Unwind_Restore_VFP_D_16_to_31):
	New functions for saving and restoring the VFPv3 registers 16 .. 31.
	* config/arm/pr-support.c (gnu_unwind_execute): Fix bug by inserting
	" + 1" in necessary places to pass the correct "number of registers"
	value to the popping routines.  Also add conditional compilation case
	to correctly handle unwind opcode 0xc8 when using VFP.
	* config/arm/unwind-arm.c (struct vfpv3_regs): New.
	(DEMAND_SAVE_VFP_D, DEMAND_SAVE_VFP_V3): New flags.
	(__gnu_Unwind_Save_VFP_D, __gnu_Unwind_Restore_VFP_D,
	__gnu_Unwind_Save_VFP_D_16_to_31, __gnu_Unwind_Restore_VFP_D_16_to_31):
	Declare.
	(restore_non_core_regs): Restore registers using FLDMD rather than
	FLDMX if required.  Also handle restoration of VFPv3 registers.
	(_Unwind_VRS_Pop): Handle saving and restoring of registers using
	FSTMD and FLDMD if required; also handle VFPv3 registers 16 .. 31,
	including cases where the caller specifies a range of registers
	that overlaps the d15/d16 boundary.

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 114850)
+++ gcc/config/arm/arm.c	(working copy)
@@ -8406,7 +8406,8 @@ print_multi_reg (FILE *stream, const cha
 }
 
 
-/* Output a FLDMX instruction to STREAM.
+/* Output a FLDMX instruction to STREAM, or an FLDMD instruction if
+   we are compiling for ARMv6 or later.
    BASE if the register containing the address.
    REG and COUNT specify the register range.
    Extra registers may be added to avoid hardware bugs.  */
@@ -8425,7 +8426,7 @@ arm_output_fldmx (FILE * stream, unsigne
     }
 
   fputc ('\t', stream);
-  asm_fprintf (stream, "fldmfdx\t%r!, {", base);
+  asm_fprintf (stream, "fldmfd%c\t%r!, {", arm_arch6 ? 'd' : 'x', base);
 
   for (i = reg; i < reg + count; i++)
     {
@@ -8448,7 +8449,7 @@ vfp_output_fstmx (rtx * operands)
   int base;
   int i;
 
-  strcpy (pattern, "fstmfdx\t%m0!, {%P1");
+  sprintf (pattern, "fstmfd%c\t%%m0!, {%%P1", arm_arch6 ? 'd' : 'x');
   p = strlen (pattern);
 
   gcc_assert (GET_CODE (operands[1]) == REG);
@@ -8475,6 +8476,7 @@ vfp_emit_fstmx (int base_reg, int count)
   rtx dwarf;
   rtx tmp, reg;
   int i;
+  int pad = arm_arch6 ? 0 : 4;
 
   /* Workaround ARM10 VFPr1 bug.  Data corruption can occur when exactly two
      register pairs are stored by a store multiple insn.  We avoid this
@@ -8487,7 +8489,9 @@ vfp_emit_fstmx (int base_reg, int count)
     }
 
   /* ??? The frame layout is implementation defined.  We describe
-     standard format 1 (equivalent to a FSTMD insn and unused pad word).
+     standard format 1 (equivalent to a FSTMD insn and unused pad word)
+     for architectures pre-v6, and FSTMD insn format without the pad
+     word for v6+.
      We really need some way of representing the whole block so that the
      unwinder can figure it out at runtime.  */
   par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (count));
@@ -8506,7 +8510,7 @@ vfp_emit_fstmx (int base_reg, int count)
 				   UNSPEC_PUSH_MULT));
 
   tmp = gen_rtx_SET (VOIDmode, stack_pointer_rtx,
-		     plus_constant (stack_pointer_rtx, -(count * 8 + 4)));
+		     plus_constant (stack_pointer_rtx, -(count * 8 + pad)));
   RTX_FRAME_RELATED_P (tmp) = 1;
   XVECEXP (dwarf, 0, 0) = tmp;
 
@@ -8536,7 +8540,7 @@ vfp_emit_fstmx (int base_reg, int count)
 				       REG_NOTES (par));
   RTX_FRAME_RELATED_P (par) = 1;
 
-  return count * 8 + 4;
+  return count * 8 + pad;
 }
 
 
@@ -9414,6 +9418,10 @@ arm_get_vfp_saved_size (void)
   unsigned int regno;
   int count;
   int saved;
+  /* Check if we need to allow for the format word that
+     is used by the FLDMX and FSTMX instructions.  (On V6 and
+     later we use FLDMD and FSTMD so no such space is required.)  */
+  int space_for_format_word = arm_arch6 ? 0 : 4;
 
   saved = 0;
   /* Space for saved VFP registers.  */
@@ -9432,7 +9440,7 @@ arm_get_vfp_saved_size (void)
 		  /* Workaround ARM10 VFPr1 bug.  */
 		  if (count == 2 && !arm_arch6)
 		    count++;
-		  saved += count * 8 + 4;
+		  saved += count * 8 + space_for_format_word;
 		}
 	      count = 0;
 	    }
@@ -9443,7 +9451,7 @@ arm_get_vfp_saved_size (void)
 	{
 	  if (count == 2 && !arm_arch6)
 	    count++;
-	  saved += count * 8 + 4;
+	  saved += count * 8 + space_for_format_word;
 	}
     }
   return saved;
@@ -9869,8 +9877,8 @@ arm_output_epilogue (rtx sibling)
 	{
 	  int saved_size;
 
-	  /* The fldmx insn does not have base+offset addressing modes,
-	     so we use IP to hold the address.  */
+	  /* The fldmx and fldmd insns do not have base+offset addressing
+             modes, so we use IP to hold the address.  */
 	  saved_size = arm_get_vfp_saved_size ();
 
 	  if (saved_size > 0)
@@ -15274,12 +15282,16 @@ arm_unwind_emit_stm (FILE * asm_out_file
 	  offset -= 4;
 	}
       reg_size = 4;
+      fprintf (asm_out_file, "\t.save {");
     }
   else if (IS_VFP_REGNUM (reg))
     {
-      /* FPA register saves use an additional word.  */
-      offset -= 4;
+      /* VFP register saves using FSTMX (i.e. what we use for pre-v6
+         architectures) require an extra word.  */
+      if (!arm_arch6)
+        offset -= 4;
       reg_size = 8;
+      fprintf (asm_out_file, arm_arch6 ? "\t.savev6 {" : "\t.save {");
     }
   else if (reg >= FIRST_FPA_REGNUM && reg <= LAST_FPA_REGNUM)
     {
@@ -15296,8 +15308,6 @@ arm_unwind_emit_stm (FILE * asm_out_file
   if (offset != nregs * reg_size)
     abort ();
 
-  fprintf (asm_out_file, "\t.save {");
-
   offset = 0;
   lastreg = 0;
   /* The remaining insns will describe the stores.  */
Index: gcc/config/arm/libunwind.S
===================================================================
--- gcc/config/arm/libunwind.S	(revision 114850)
+++ gcc/config/arm/libunwind.S	(working copy)
@@ -62,20 +62,46 @@ ARM_FUNC_START restore_core_regs
 	FUNC_END restore_core_regs
 	UNPREFIX restore_core_regs
 
-/* Load VFP registers d0-d15 from the address in r0.  */
+/* Load VFP registers d0-d15 from the address in r0.
+   Use this to load from FSTMX format.  */
 ARM_FUNC_START gnu_Unwind_Restore_VFP
 	/* Use the generic coprocessor form so that gas doesn't complain
 	   on soft-float targets.  */
 	ldc   p11,cr0,[r0],{0x21} /* fldmiax r0, {d0-d15} */
 	RET
 
-/* Store VFR regsters d0-d15 to the address in r0.  */
+/* Store VFP registers d0-d15 to the address in r0.
+   Use this to store in FSTMX format.  */
 ARM_FUNC_START gnu_Unwind_Save_VFP
 	/* Use the generic coprocessor form so that gas doesn't complain
 	   on soft-float targets.  */
 	stc   p11,cr0,[r0],{0x21} /* fstmiax r0, {d0-d15} */
 	RET
 
+/* Load VFP registers d0-d15 from the address in r0.
+   Use this to load from LSTMD format.  */
+ARM_FUNC_START gnu_Unwind_Restore_VFP_D
+	ldc   p11,cr0,[r0],{0x20} /* fldmiad r0, {d0-d15} */
+	RET
+
+/* Store VFP registers d0-d15 to the address in r0.
+   Use this to store in FLDMD format.  */
+ARM_FUNC_START gnu_Unwind_Save_VFP_D
+	stc   p11,cr0,[r0],{0x20} /* fstmiad r0, {d0-d15} */
+	RET
+
+/* Load VFP registers d16-d31 from the address in r0.
+   Use this to load from LSTMD (=VSTM) format.  Needs VFPv3.  */
+ARM_FUNC_START gnu_Unwind_Restore_VFP_D_16_to_31
+	ldcl  p11,cr0,[r0],{0x20} /* vldm r0, {d16-d31} */
+	RET
+
+/* Store VFP registers d16-d31 to the address in r0.
+   Use this to store in FLDMD (=VLDM) format.  Needs VFPv3.  */
+ARM_FUNC_START gnu_Unwind_Save_VFP_D_16_to_31
+	stcl  p11,cr0,[r0],{0x20} /* vstm r0, {d16-d31} */
+	RET
+
 /* Wrappers to save core registers, then call the real routine.   */
 
 .macro  UNWIND_WRAPPER name nargs
Index: gcc/config/arm/pr-support.c
===================================================================
--- gcc/config/arm/pr-support.c	(revision 114850)
+++ gcc/config/arm/pr-support.c	(working copy)
@@ -224,7 +224,7 @@ __gnu_unwind_execute (_Unwind_Context * 
 	    {
 	      /* Pop VFP registers with fldmx.  */
 	      op = next_unwind_byte (uws);
-	      op = ((op & 0xf0) << 12) | (op & 0xf);
+	      op = ((op & 0xf0) << 12) | ((op & 0xf) + 1);
 	      if (_Unwind_VRS_Pop (context, _UVRSC_VFP, op, _UVRSD_VFPX)
 		  != _UVRSR_OK)
 		return _URC_FAILURE;
@@ -253,7 +253,7 @@ __gnu_unwind_execute (_Unwind_Context * 
 	    {
 	      /* Pop iWMMXt D registers.  */
 	      op = next_unwind_byte (uws);
-	      op = ((op & 0xf0) << 12) | (op & 0xf);
+	      op = ((op & 0xf0) << 12) | ((op & 0xf) + 1);
 	      if (_Unwind_VRS_Pop (context, _UVRSC_WMMXD, op, _UVRSD_UINT64)
 		  != _UVRSR_OK)
 		return _URC_FAILURE;
@@ -280,21 +280,34 @@ __gnu_unwind_execute (_Unwind_Context * 
 		return _URC_FAILURE;
 	      continue;
 	    }
+
+	  /* Opcode 0xc8 used to be "Spare (was Pop FPA)".
+	     Nowadays it's used for popping VFPv3 registers.  */
 	  if (op == 0xc8)
 	    {
+#ifdef __VFP_FP__
 	      /* Pop FPA registers.  */
 	      op = next_unwind_byte (uws);
-	      op = ((op & 0xf0) << 12) | (op & 0xf);
+	      op = ((op & 0xf0) << 12) | ((op & 0xf) + 1);
 	      if (_Unwind_VRS_Pop (context, _UVRSC_FPA, op, _UVRSD_FPAX)
 		  != _UVRSR_OK)
 		return _URC_FAILURE;
 	      continue;
-	    }
+#else
+              /* Pop VFPv3 registers D[16+ssss]-D[16+ssss+cccc] with vldm.  */
+              op = next_unwind_byte (uws);
+              op = (((op & 0xf0) + 16) << 12) | ((op & 0xf) + 1);
+              if (_Unwind_VRS_Pop (context, _UVRSC_VFP, op, _UVRSD_DOUBLE)
+                  != _UVRSR_OK)
+                return _URC_FAILURE;
+              continue;
+#endif
+            }
 	  if (op == 0xc9)
 	    {
 	      /* Pop VFP registers with fldmd.  */
 	      op = next_unwind_byte (uws);
-	      op = ((op & 0xf0) << 12) | (op & 0xf);
+	      op = ((op & 0xf0) << 12) | ((op & 0xf) + 1);
 	      if (_Unwind_VRS_Pop (context, _UVRSC_VFP, op, _UVRSD_DOUBLE)
 		  != _UVRSR_OK)
 		return _URC_FAILURE;
Index: gcc/config/arm/unwind-arm.c
===================================================================
--- gcc/config/arm/unwind-arm.c	(revision 114850)
+++ gcc/config/arm/unwind-arm.c	(working copy)
@@ -73,6 +73,13 @@ struct vfp_regs
   _uw pad;
 };
 
+struct vfpv3_regs
+{
+  /* Always populated via VSTM, so no need for the "pad" field from
+     vfp_regs (which is used to store the format word for FSTMX).  */
+  _uw64 d[16];
+};
+
 struct fpa_reg
 {
   _uw w[3];
@@ -113,10 +120,14 @@ typedef struct
   struct core_regs core;
   _uw prev_sp; /* Only valid during forced unwinding.  */
   struct vfp_regs vfp;
+  struct vfpv3_regs vfp_regs_16_to_31;
   struct fpa_regs fpa;
 } phase1_vrs;
 
-#define DEMAND_SAVE_VFP 1
+#define DEMAND_SAVE_VFP 1	/* VFP state has been saved if not set */
+#define DEMAND_SAVE_VFP_D 2	/* VFP state is for FLDMD/FSTMD if set */
+#define DEMAND_SAVE_VFP_V3 4    /* VFPv3 state for regs 16 .. 31 has
+                                   been saved if not set */
 
 /* This must match the structure created by the assembly wrappers.  */
 typedef struct
@@ -142,15 +153,33 @@ void __attribute__((noreturn)) restore_c
 
 /* Coprocessor register state manipulation functions.  */
 
+/* Routines for FLDMX/FSTMX format...  */
 void __gnu_Unwind_Save_VFP (struct vfp_regs * p);
 void __gnu_Unwind_Restore_VFP (struct vfp_regs * p);
 
+/* ...and those for FLDMD/FSTMD format...  */
+void __gnu_Unwind_Save_VFP_D (struct vfp_regs * p);
+void __gnu_Unwind_Restore_VFP_D (struct vfp_regs * p);
+
+/* ...and those for VLDM/VSTM format, saving/restoring only registers
+   16 through 31.  */
+void __gnu_Unwind_Save_VFP_D_16_to_31 (struct vfpv3_regs * p);
+void __gnu_Unwind_Restore_VFP_D_16_to_31 (struct vfpv3_regs * p);
+
 /* Restore coprocessor state after phase1 unwinding.  */
 static void
 restore_non_core_regs (phase1_vrs * vrs)
 {
   if ((vrs->demand_save_flags & DEMAND_SAVE_VFP) == 0)
-    __gnu_Unwind_Restore_VFP (&vrs->vfp);
+    {
+      if (vrs->demand_save_flags & DEMAND_SAVE_VFP_D)
+        __gnu_Unwind_Restore_VFP_D (&vrs->vfp);
+      else
+        __gnu_Unwind_Restore_VFP (&vrs->vfp);
+    }
+
+  if ((vrs->demand_save_flags & DEMAND_SAVE_VFP_V3) == 0)
+    __gnu_Unwind_Restore_VFP_D_16_to_31 (&vrs->vfp_regs_16_to_31);
 }
 
 /* A better way to do this would probably be to compare the absolute address
@@ -273,35 +302,101 @@ _Unwind_VRS_Result _Unwind_VRS_Pop (_Unw
 	_uw start = discriminator >> 16;
 	_uw count = discriminator & 0xffff;
 	struct vfp_regs tmp;
+	struct vfpv3_regs tmp_16_to_31;
+	int tmp_count;
 	_uw *sp;
 	_uw *dest;
+        int num_vfpv3_regs = 0;
 
+        /* We use an approximation here by bounding _UVRSD_DOUBLE
+           register numbers at 32 always, since we can't detect if
+           VFPv3 isn't present (in such a case the upper limit is 16).  */
 	if ((representation != _UVRSD_VFPX && representation != _UVRSD_DOUBLE)
-	    || start + count > 16)
+            || start + count > (representation == _UVRSD_VFPX ? 16 : 32)
+            || (representation == _UVRSD_VFPX && start >= 16))
 	  return _UVRSR_FAILED;
 
-	if (vrs->demand_save_flags & DEMAND_SAVE_VFP)
+        /* Check if we're being asked to pop VFPv3-only registers
+           (numbers 16 through 31).  */
+	if (start >= 16)
+          num_vfpv3_regs = count;
+        else if (start + count > 16)
+          num_vfpv3_regs = start + count - 16;
+
+        if (num_vfpv3_regs && representation != _UVRSD_DOUBLE)
+          return _UVRSR_FAILED;
+
+	/* Demand-save coprocessor registers for stage1.  */
+	if (start < 16 && (vrs->demand_save_flags & DEMAND_SAVE_VFP))
 	  {
-	    /* Demand-save resisters for stage1.  */
 	    vrs->demand_save_flags &= ~DEMAND_SAVE_VFP;
-	    __gnu_Unwind_Save_VFP (&vrs->vfp);
+
+            if (representation == _UVRSD_DOUBLE)
+              {
+                /* Save in FLDMD/FSTMD format.  */
+	        vrs->demand_save_flags |= DEMAND_SAVE_VFP_D;
+	        __gnu_Unwind_Save_VFP_D (&vrs->vfp);
+              }
+            else
+              {
+                /* Save in FLDMX/FSTMX format.  */
+	        vrs->demand_save_flags &= ~DEMAND_SAVE_VFP_D;
+	        __gnu_Unwind_Save_VFP (&vrs->vfp);
+              }
+	  }
+
+        if (num_vfpv3_regs > 0
+            && (vrs->demand_save_flags & DEMAND_SAVE_VFP_V3))
+	  {
+	    vrs->demand_save_flags &= ~DEMAND_SAVE_VFP_V3;
+            __gnu_Unwind_Save_VFP_D_16_to_31 (&vrs->vfp_regs_16_to_31);
 	  }
 
 	/* Restore the registers from the stack.  Do this by saving the
 	   current VFP registers to a memory area, moving the in-memory
 	   values into that area, and restoring from the whole area.
 	   For _UVRSD_VFPX we assume FSTMX standard format 1.  */
-	__gnu_Unwind_Save_VFP (&tmp);
+        if (representation == _UVRSD_VFPX)
+  	  __gnu_Unwind_Save_VFP (&tmp);
+        else
+          {
+	    /* Save registers 0 .. 15 if required.  */
+            if (start < 16)
+              __gnu_Unwind_Save_VFP_D (&tmp);
+
+	    /* Save VFPv3 registers 16 .. 31 if required.  */
+            if (num_vfpv3_regs)
+  	      __gnu_Unwind_Save_VFP_D_16_to_31 (&tmp_16_to_31);
+          }
+
+	/* Work out how many registers below register 16 need popping.  */
+	tmp_count = num_vfpv3_regs > 0 ? 16 - start : count;
 
-	/* The stack address is only guaranteed to be word aligned, so
+	/* Copy registers below 16, if needed.
+	   The stack address is only guaranteed to be word aligned, so
 	   we can't use doubleword copies.  */
 	sp = (_uw *) vrs->core.r[R_SP];
-	dest = (_uw *) &tmp.d[start];
-	count *= 2;
-	while (count--)
-	  *(dest++) = *(sp++);
+        if (tmp_count > 0)
+          {
+	    tmp_count *= 2;
+	    dest = (_uw *) &tmp.d[start];
+	    while (tmp_count--)
+	      *(dest++) = *(sp++);
+          }
+
+	/* Copy VFPv3 registers numbered >= 16, if needed.  */
+        if (num_vfpv3_regs > 0)
+          {
+            /* num_vfpv3_regs is needed below, so copy it.  */
+            int tmp_count_2 = num_vfpv3_regs * 2;
+            int vfpv3_start = start < 16 ? 16 : start;
+
+	    dest = (_uw *) &tmp_16_to_31.d[vfpv3_start - 16];
+	    while (tmp_count_2--)
+	      *(dest++) = *(sp++);
+          }
 
-	/* Skip the pad word */
+	/* Skip the format word space if using FLDMX/FSTMX format.  */
 	if (representation == _UVRSD_VFPX)
 	  sp++;
 
@@ -309,7 +404,18 @@ _Unwind_VRS_Result _Unwind_VRS_Pop (_Unw
 	vrs->core.r[R_SP] = (_uw) sp;
 
 	/* Reload the registers.  */
-	__gnu_Unwind_Restore_VFP (&tmp);
+        if (representation == _UVRSD_VFPX)
+  	  __gnu_Unwind_Restore_VFP (&tmp);
+        else
+          {
+	    /* Restore registers 0 .. 15 if required.  */
+            if (start < 16)
+              __gnu_Unwind_Restore_VFP_D (&tmp);
+
+	    /* Restore VFPv3 registers 16 .. 31 if required.  */
+            if (num_vfpv3_regs > 0)
+  	      __gnu_Unwind_Restore_VFP_D_16_to_31 (&tmp_16_to_31);
+          }
       }
       return _UVRSR_OK;
 

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]