This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

powerpc-linux altivec vector function arg passing


The testcase in PR 15106 manages to show major problems with
powerpc-linux vector function calls.  The two that stand out are:

a) ABI_V4 va_arg handling code has many disagreements with the
   function calling code over where to find vector args.
b) -maltivec without -mabi=altivec may result in stvx instructions
   attempting to store to mis-aligned locations.  Since AltiVec insns
   simply ignore the low address bits if mis-aligned, we get stack
   corruption.  Typically, it is the LR and back chain link that
   are trashed.  Joys.

(b) can be seen by examining assembly output of a testcase like

#define vector __attribute__ ((vector_size (16)))
vector int v1;
extern void bar (int, vector int, vector int, vector int,
		 vector int, vector int);
void f (void)
{
  bar (2, v1, v1, v1, v1, v1);
}

void bar (int a, vector int b, vector int c, vector int d,
	  vector int e, vector int f)
{
  v1 = b + c + d + e + f;
}

I think (b) will affect other rs6000 targets too, but I haven't
investigated in detail.  powerpc64-linux generates mis-aligned lvx
accesses for bar when compiling with -maltivec -mabi=no-altivec, so it's
not just 32 bit targets that might be affected.


The patch below fixes all these problems, but I'm not presenting for
review and commit, but rather for discussion because it involves an ABI
change.  I also think there is possibly a better 32 bit ABI change if we
do want to change the ABI.

The ABI change I've made here is to align all AltiVec style vector
function parameters to 16 byte boundaries.  This is already done when
using -mabi=altivec, so the change only affects -mabi=no-altivec.  The
advantage of this change is that it avoids bouncing vector params
through aligned buffers in -maltivec mode;  All vectors passed on the
stack will be aligned, and the stack home for vectors passed in gprs
also will be aligned.  It's also a simple change.  The disadvantage of
the change, particularly on 32 bit, is that aligning can reduced the
number of gprs available to pass parameters.  Choosing to align does
have some precedent, as that is how long long and SPE vectors are passed
on powerpc-linux.

The alternative 32 bit ABI change, again just for -mabi=no-altivec,
would be to pass vectors by reference.  I'm inclined to think that is
a good idea, as 16 byte vectors use 4 gprs when passing by value.
powerpc-linux passes aggregates this way, so there's some precedent here
too.

Without an ABI change, we're left with the need to shuffle vectors
through aligned buffers.  I really, really don't like the idea of extra
memory transfers.  Or even worse, extra memory footprint.  Modern
processors can do a *lot* of processing in one cache miss time.

Thoughts?

Index: gcc/config/rs6000/rs6000.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.c,v
retrieving revision 1.630
diff -u -p -r1.630 rs6000.c
--- gcc/config/rs6000/rs6000.c	24 Apr 2004 06:37:18 -0000	1.630
+++ gcc/config/rs6000/rs6000.c	28 Apr 2004 07:08:31 -0000
@@ -4038,9 +4038,9 @@ function_arg_boundary (enum machine_mode
 {
   if (DEFAULT_ABI == ABI_V4 && (mode == DImode || mode == DFmode))
     return 64;
-   else if (SPE_VECTOR_MODE (mode))
-     return 64;
-  else if (TARGET_ALTIVEC_ABI && ALTIVEC_VECTOR_MODE (mode))
+  else if (SPE_VECTOR_MODE (mode))
+    return 64;
+  else if (ALTIVEC_VECTOR_MODE (mode))
     return 128;
   else
     return PARM_BOUNDARY;
@@ -4088,7 +4088,7 @@ function_arg_advance (CUMULATIVE_ARGS *c
 	 even if it is going to be passed in a vector register.  
 	 Darwin does the same for variable-argument functions.  */
       if ((DEFAULT_ABI == ABI_AIX && TARGET_64BIT)
-		   || (cum->stdarg && DEFAULT_ABI != ABI_V4))
+	  || (cum->stdarg && DEFAULT_ABI != ABI_V4))
         {
 	  int align;
 	  
@@ -4100,7 +4100,7 @@ function_arg_advance (CUMULATIVE_ARGS *c
 	     aligned.  Space for GPRs is reserved even if the argument
 	     will be passed in memory.  */
 	  if (TARGET_32BIT)
-	    align = ((6 - (cum->words & 3)) & 3);
+	    align = (2 - cum->words) & 3;
 	  else
 	    align = cum->words & 1;
 	  cum->words += align + rs6000_arg_size (mode, type);
@@ -4145,17 +4145,20 @@ function_arg_advance (CUMULATIVE_ARGS *c
 	  else 
 	    n_words = rs6000_arg_size (mode, type);
 
-	  /* Long long and SPE vectors are put in odd registers.  */
-	  if (n_words == 2 && (gregno & 1) == 0)
-	    gregno += 1;
+	  /* Long long and SPE vectors are put in (r3,r4), (r5,r6),
+	     (r7,r8) or (r9,r10).  An AltiVec vector must go in
+	     (r5,r6,r7,r8) so that the parm save area for it is
+	     properly aligned.  */
+	  gregno += (1 - gregno) & (n_words - 1);
 
-	  /* Long long and SPE vectors are not split between registers
+	  /* Long long and vectors are not split between registers
 	     and stack.  */
 	  if (gregno + n_words - 1 > GP_ARG_MAX_REG)
 	    {
-	      /* Long long is aligned on the stack.  */
-	      if (n_words == 2)
-		cum->words += cum->words & 1;
+	      /* Long long and vectors are aligned on the stack.  The
+		 "2" offset accounts for the 2 word back chain and lr
+		 save area.  */
+	      cum->words += (2 - cum->words) & (n_words - 1);
 	      cum->words += n_words;
 	    }
 
@@ -4177,10 +4180,10 @@ function_arg_advance (CUMULATIVE_ARGS *c
     }
   else
     {
-      int align = (TARGET_32BIT && (cum->words & 1) != 0
-		   && function_arg_boundary (mode, type) == 64) ? 1 : 0;
+      int align = function_arg_boundary (mode, type) / PARM_BOUNDARY - 1;
 
-      cum->words += align + rs6000_arg_size (mode, type);
+      cum->words += ((TARGET_32BIT ? 2 : 0) - cum->words) & align;
+      cum->words += rs6000_arg_size (mode, type);
 
       if (GET_MODE_CLASS (mode) == MODE_FLOAT
 	  && TARGET_HARD_FLOAT && TARGET_FPRS)
@@ -4436,7 +4439,7 @@ function_arg (CUMULATIVE_ARGS *cum, enum
 	     they just have to start on an even word, since the parameter
 	     save area is 16-byte aligned.  */
 	  if (TARGET_32BIT)
-	    align = ((6 - (cum->words & 3)) & 3);
+	    align = (2 - cum->words) & 3;
 	  else
 	    align = cum->words & 1;
 	  align_words = cum->words + align;
@@ -4480,11 +4483,14 @@ function_arg (CUMULATIVE_ARGS *cum, enum
 	  else 
 	    n_words = rs6000_arg_size (mode, type);
 
-	  /* Long long and SPE vectors are put in odd registers.  */
-	  if (n_words == 2 && (gregno & 1) == 0)
-	    gregno += 1;
+	  /* Long long and SPE vectors are put in (r3,r4), (r5,r6),
+	     (r7,r8) or (r9,r10).  AltiVec vectors must go in
+	     (r5,r6,r7,r8) so that the parm save area for them is
+	     properly aligned.  */
+	  gregno += (1 - gregno) & (n_words - 1);
 
-	  /* Long long does not split between registers and stack.  */
+	  /* Long long and vectors are not split between registers
+	     and stack.  */
 	  if (gregno + n_words - 1 <= GP_ARG_MAX_REG)
 	    return gen_rtx_REG (mode, gregno);
 	  else
@@ -4493,13 +4499,15 @@ function_arg (CUMULATIVE_ARGS *cum, enum
     }
   else
     {
-      int align = (TARGET_32BIT && (cum->words & 1) != 0
-	           && function_arg_boundary (mode, type) == 64) ? 1 : 0;
-      int align_words = cum->words + align;
+      int align, align_words;
 
       if (type && TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST)
         return NULL_RTX;
 
+      align = function_arg_boundary (mode, type) / PARM_BOUNDARY - 1;
+      align_words = ((TARGET_32BIT ? 2 : 0) - cum->words) & align;
+      align_words += cum->words;
+
       if (TARGET_32BIT && TARGET_POWERPC64
 	  && (mode == DImode || mode == BLKmode))
 	return rs6000_mixed_function_arg (cum, mode, type, align_words);
@@ -4994,19 +5001,18 @@ rs6000_va_arg (tree valist, tree type)
   lab_over = gen_label_rtx ();
   addr_rtx = gen_reg_rtx (Pmode);
 
-  /*  AltiVec vectors never go in registers.  */
-  if (!TARGET_ALTIVEC || TREE_CODE (type) != VECTOR_TYPE)
+  /*  AltiVec vectors never go in registers when using the AltiVec ABI.  */
+  if (!TARGET_ALTIVEC_ABI || TREE_CODE (type) != VECTOR_TYPE)
     {
-      TREE_THIS_VOLATILE (reg) = 1;
-      emit_cmp_and_jump_insns
-	(expand_expr (reg, NULL_RTX, QImode, EXPAND_NORMAL),
-	 GEN_INT (8 - n_reg + 1), GE, const1_rtx, QImode, 1,
-	 lab_false);
-
-      /* Long long is aligned in the registers.  */
+      /* Long long and vectors are aligned in the registers.  */
       if (n_reg > 1)
 	{
-	  u = build (BIT_AND_EXPR, TREE_TYPE (reg), reg,
+	  u = reg;
+	  if (n_reg == 4)
+	    /* Account for the 2 word offset.  */
+	    u = build (MINUS_EXPR, TREE_TYPE (reg), build_int_2 (2, 0), u);
+
+	  u = build (BIT_AND_EXPR, TREE_TYPE (reg), u,
 		     build_int_2 (n_reg - 1, 0));
 	  u = build (PLUS_EXPR, TREE_TYPE (reg), reg, u);
 	  u = build (MODIFY_EXPR, TREE_TYPE (reg), reg, u);
@@ -5014,6 +5020,11 @@ rs6000_va_arg (tree valist, tree type)
 	  expand_expr (u, const0_rtx, VOIDmode, EXPAND_NORMAL);
 	}
 
+      emit_cmp_and_jump_insns
+	(expand_expr (reg, NULL_RTX, QImode, EXPAND_NORMAL),
+	 GEN_INT (8 - n_reg + 1), GE, const1_rtx, QImode, 1,
+	 lab_false);
+
       if (sav_ofs)
 	t = build (PLUS_EXPR, ptr_type_node, sav, build_int_2 (sav_ofs, 0));
       else
@@ -5044,34 +5055,17 @@ rs6000_va_arg (tree valist, tree type)
 
   /* ... otherwise out of the overflow area.  */
 
-  /* Make sure we don't find reg 7 for the next int arg.
-
-     All AltiVec vectors go in the overflow area.  So in the AltiVec
-     case we need to get the vectors from the overflow area, but
-     remember where the GPRs and FPRs are.  */
-  if (n_reg > 1 && (TREE_CODE (type) != VECTOR_TYPE
-		    || !TARGET_ALTIVEC))
-    {
-      t = build (MODIFY_EXPR, TREE_TYPE (reg), reg, build_int_2 (8, 0));
-      TREE_SIDE_EFFECTS (t) = 1;
-      expand_expr (t, const0_rtx, VOIDmode, EXPAND_NORMAL);
-    }
-
   /* Care for on-stack alignment if needed.  */
-  if (rsize <= 1)
-    t = ovf;
-  else
+  t = ovf;
+  if (rsize > 1)
     {
-      int align;
-
-      /* AltiVec vectors are 16 byte aligned.  */
-      if (TARGET_ALTIVEC && TREE_CODE (type) == VECTOR_TYPE)
-	align = 15;
-      else
-	align = 7;
+      int align = 8;
+      if (TREE_CODE (type) == VECTOR_TYPE)
+	/* AltiVec vectors are 16 byte aligned.  */
+	align = 16;
 
-      t = build (PLUS_EXPR, TREE_TYPE (ovf), ovf, build_int_2 (align, 0));
-      t = build (BIT_AND_EXPR, TREE_TYPE (t), t, build_int_2 (-align-1, -1));
+      t = build (PLUS_EXPR, TREE_TYPE (t), t, build_int_2 (align - 1, 0));
+      t = build (BIT_AND_EXPR, TREE_TYPE (t), t, build_int_2 (-align, -1));
     }
   t = save_expr (t);
 
Index: gcc/config/rs6000/rs6000.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.h,v
retrieving revision 1.320
diff -u -p -r1.320 rs6000.h
--- gcc/config/rs6000/rs6000.h	24 Apr 2004 06:37:19 -0000	1.320
+++ gcc/config/rs6000/rs6000.h	28 Apr 2004 07:08:33 -0000
@@ -718,7 +718,8 @@ extern const char *rs6000_warn_altivec_l
 #define PARM_BOUNDARY (TARGET_32BIT ? 32 : 64)
 
 /* Boundary (in *bits*) on which stack pointer should be aligned.  */
-#define STACK_BOUNDARY ((TARGET_32BIT && !TARGET_ALTIVEC_ABI) ? 64 : 128)
+#define STACK_BOUNDARY \
+  ((TARGET_32BIT && !TARGET_ALTIVEC && !TARGET_ALTIVEC_ABI) ? 64 : 128)
 
 /* Allocation boundary (in *bits*) for the code of a function.  */
 #define FUNCTION_BOUNDARY 32
Index: gcc/config/rs6000/sysv4.h
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/sysv4.h,v
retrieving revision 1.148
diff -u -p -r1.148 sysv4.h
--- gcc/config/rs6000/sysv4.h	11 Apr 2004 06:21:05 -0000	1.148
+++ gcc/config/rs6000/sysv4.h	28 Apr 2004 07:08:39 -0000
@@ -385,12 +385,6 @@ do {									\
 #undef	STRICT_ALIGNMENT
 #define	STRICT_ALIGNMENT (TARGET_STRICT_ALIGN)
 
-/* Alignment in bits of the stack boundary.  Note, in order to allow building
-   one set of libraries with -mno-eabi instead of eabi libraries and non-eabi
-   versions, just use 64 as the stack boundary.  */
-#undef	STACK_BOUNDARY
-#define	STACK_BOUNDARY	(TARGET_ALTIVEC_ABI ? 128 : 64)
-
 /* Define this macro if you wish to preserve a certain alignment for
    the stack pointer, greater than what the hardware enforces.  The
    definition is a C expression for the desired alignment (measured
@@ -407,7 +401,8 @@ do {									\
 #define PREFERRED_STACK_BOUNDARY 128
 
 /* Real stack boundary as mandated by the appropriate ABI.  */
-#define ABI_STACK_BOUNDARY ((TARGET_EABI && !TARGET_ALTIVEC_ABI) ? 64 : 128)
+#define ABI_STACK_BOUNDARY \
+  ((TARGET_EABI && !TARGET_ALTIVEC && !TARGET_ALTIVEC_ABI) ? 64 : 128)
 
 /* An expression for the alignment of a structure field FIELD if the
    alignment computed in the usual way is COMPUTED.  */

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]