This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [RFA]: Merge stack alignment branch


Jan,

As to your comments:
+  /* DRAP is needed for stack realign if longjmp is expanded to current

+     function  */
+  if (MAX_VECTORIZE_STACK_ALIGNMENT && !cfun->need_drap)
+    cfun->need_drap = true;

> Simiarly need_drap is RTL properly, so it should live in crtl.  I know
> that most of the flags are still in cfun, I plan to move them soon,
once
> the already patch posted in this series is reviewed.  Sorry for all
the
> conflicts I must've caused.
need_drap is set in expansion. It should be moved to rtl_data all
together with other similar flags, wrt HJ's comment
http://gcc.gnu.org/ml/gcc-patches/2008-04/msg01019.html

> How exactly longjmp/setjmp machinery imply need for DRAP in functions
> not needing alignment otherwise? 
Longjmp will be expanded into a couple of RTLs, one of which sets stack
pointer. Stack realignment can't live with setting to stack pointer in
function body without DRAP. 
The reason behind this is that reload may not eliminate frame_pointer
into stack_pointer once stack pointer is altered somewhere other than
prologue/epilogue.

Thanks - Joey

-----Original Message-----
From: Jan Hubicka [mailto:hubicka@ucw.cz] 
Sent: Friday, April 11, 2008 6:04 PM
To: Ye, Joey
Cc: GCC Patches; Lu, Hongjiu; Guo, Xuepeng; ubizjak@gmail.com
Subject: Re: [RFA]: Merge stack alignment branch

Thank you for breaking this up!  It seems to me that the generic part of
patch still contain several different infrastructure changes (that is
stack frame alignment tracking, drap pointer support, some of
incomming_arg_rtx related changes and the actual target macro bits).

I would still suggest breaking the generic bits up further and submit
them one by one so RTL maintainers can handle them more easilly.


Index: flags.h
===================================================================
--- flags.h	(.../trunk/gcc)	(revision 134098)
+++ flags.h	(.../branches/stack/gcc)	(revision 134141)
@@ -223,12 +223,6 @@ extern int flag_dump_rtl_in_asm;
 
 /* Other basic status info about current function.  */
 
-/* Nonzero means current function must be given a frame pointer.
-   Set in stmt.c if anything is allocated on the stack there.
-   Set in reload1.c if anything is allocated on the stack there.  */
-
-extern int frame_pointer_needed;

frame_pointer_needed should IMO go into crtl, instead of cfun.  It is
computed only at expansion time, right?

Index: builtins.c
===================================================================
--- builtins.c	(.../trunk/gcc)	(revision 134098)
+++ builtins.c	(.../branches/stack/gcc)	(revision 134141)
@@ -740,7 +740,7 @@ expand_builtin_setjmp_receiver (rtx rece
 	{
 	  /* Now restore our arg pointer from the address at which it
 	     was saved in our stack frame.  */
-	  emit_move_insn (virtual_incoming_args_rtx,
+	  emit_move_insn (crtl->args.internal_arg_pointer,

Should not the move into virtual_incoming_args_rtx eliminated back to
internal_arg_pointer store anyway?
I was in impression that virtual_incoming_args_rtx should expand to
direct stack frame reference when internal_arg_pointer is unused and to
internal_arg_pointer otherwise...
 
+  /* DRAP is needed for stack realign if longjmp is expanded to current

+     function  */
+  if (MAX_VECTORIZE_STACK_ALIGNMENT && !cfun->need_drap)
+    cfun->need_drap = true;

Simiarly need_drap is RTL properly, so it should live in crtl.  I know
that most of the flags are still in cfun, I plan to move them soon, once
the already patch posted in this series is reviewed.  Sorry for all the
conflicts I must've caused.

How exactly longjmp/setjmp machinery imply need for DRAP in functions
not needing alignment otherwise?
Index: global.c
===================================================================
--- global.c	(.../trunk/gcc)	(revision 134098)
+++ global.c	(.../branches/stack/gcc)	(revision 134141)
@@ -247,10 +247,20 @@ compute_regsets (HARD_REG_SET *elim_set,
   static const struct {const int from, to; } eliminables[] =
ELIMINABLE_REGS;
   size_t i;
 #endif
+
+  /* FIXME: If EXIT_IGNORE_STACK is set, we will not save and restore
+     sp for alloca.  So we can't eliminate the frame pointer in that
+     case.  At some point, we should improve this by emitting the
+     sp-adjusting insns for this case.  */
   int need_fp
     = (! flag_omit_frame_pointer
        || (current_function_calls_alloca && EXIT_IGNORE_STACK)
-       || FRAME_POINTER_REQUIRED);
+       || FRAME_POINTER_REQUIRED
+       || current_function_accesses_prior_frames
+       || cfun->stack_realign_needed);
+
+  frame_pointer_needed = need_fp;
+  cfun->need_frame_pointer_set = 1;

Originally we decided on whether we need frame pointer during reload.
It was always my impression that it is done because we do want to decide
later, during reload iterations, that the frame pointer is required, via
FRAME_POINTER_REQUIRED macro on some architectures.

Are you sure that you can always safely decide on frame pointer
beforehand?
+  if (MAX_VECTORIZE_STACK_ALIGNMENT)
+    {
+      if (cfun->stack_alignment_estimated < alignment_in_bits)
+	{
+          if (!cfun->stack_realign_processed)
+            cfun->stack_alignment_estimated = alignment_in_bits;
+          else
+	    {
+	      gcc_assert (!cfun->stack_realign_finalized);
+	      if (!cfun->stack_realign_needed)
+		{
+		  /* It is OK to reduce the alignment as long as the
+		     requested size is 0 or the estimated stack
+		     alignment >= mode alignment.  */

So basically the purpose of stack_alignment_estimated is to avoid
wasting stack frame space by padding when LOCAL_ALIGNMENT request
alignment greater than STACK_BOUNDARY and we believe that the function
won't need the DRAP code?

The comment should probably mention why we ever request size of 0.  I
don't know at least ;)
+		  gcc_assert (size == 0
+			      || (cfun->stack_alignment_estimated
+				  >= mode_alignment));
+		  alignment_in_bits = cfun->stack_alignment_estimated;
+		  alignment = alignment_in_bits / BITS_PER_UNIT;
+		}
+	    }
+	}
+    }
+  else
+    {
+      /* Ignore alignment we can't do with expected alignment of the
+	 boundary.  */
+      if (alignment * BITS_PER_UNIT > PREFERRED_STACK_BOUNDARY)
+	alignment = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;

It seems that alignment_in_bits recomuptation is missing here.
@@ -2968,6 +3010,20 @@ assign_parms (tree fndecl)

 	  continue;
 	}
 
+      /* Estimate stack alignment from parameter alignment */
+      if (MAX_VECTORIZE_STACK_ALIGNMENT)
+        {
+          unsigned int align = FUNCTION_ARG_BOUNDARY
(data.promoted_mode,
+						      data.passed_type);
+	  if (TYPE_ALIGN (data.nominal_type) > align)
+	    align = TYPE_ALIGN (data.passed_type);
+	  if (cfun->stack_alignment_estimated < align)
+	    {
+	      gcc_assert (!cfun->stack_realign_processed);
+	      cfun->stack_alignment_estimated = align;
+	    }
+	}
+	

That TYPE_ALIGN bump seems wrong.  If you want to pass that type
aligned, FUNCTION_ARG_BOUNDARY should return proper value?

When incomming argument or return value is aligned, why it affects the
function's stack frame alignment at all? Those live in the caller
function stack frame that is already aligned.

Do we take into account that the alignment of stack pointer is known and
we don't need to re-align?
       if (current_function_stdarg && !TREE_CHAIN (parm))
 	assign_parms_setup_varargs (&all, &data, false);
 
Index: tree-vectorizer.c
===================================================================
--- tree-vectorizer.c	(.../trunk/gcc)	(revision 134098)
+++ tree-vectorizer.c	(.../branches/stack/gcc)	(revision
134141)
@@ -1786,9 +1786,19 @@ vect_can_force_dr_alignment_p (const_tre
 
   if (TREE_STATIC (decl))
     return (alignment <= MAX_OFILE_ALIGNMENT);
+  else if (MAX_VECTORIZE_STACK_ALIGNMENT)
+    {
+      gcc_assert (!cfun->stack_realign_processed);
+      if (alignment <= MAX_VECTORIZE_STACK_ALIGNMENT)
+	{
+	  if (cfun->stack_alignment_estimated < alignment)
+	    cfun->stack_alignment_estimated = alignment;

I would preffer if all alignments was visible at RTL expansion time.  

So the reason why you need to handle stack_alignment_estimated at tree
level is that when vectorizing you know you are introducing alignment
requirement on a local array, while later RTL expansion can't work it
out, since the memory accesses are no longer clearly associated with the
stack frame area?

Perhaps deriving type and changing the static array type to an array
with greater alignemnt would work?
Index: function.h
===================================================================
--- function.h	(.../trunk/gcc)	(revision 134098)
+++ function.h	(.../branches/stack/gcc)	(revision 134141)
@@ -352,9 +355,16 @@ struct function GTY(())
   /* tm.h can use this to store whatever it likes.  */
   struct machine_function * GTY ((maybe_undef)) machine;
 
-  /* The largest alignment of slot allocated on the stack.  */
+  /* The largest alignment needed on the stack, including requirement
+     for outgoing stack alignment.  */
   unsigned int stack_alignment_needed;
 
+  /* The largest alignment of slot allocated on the stack.  */
+  unsigned int stack_alignment_used;
+
+  /* The estimated stack alignment.  */
+  unsigned int stack_alignment_estimated;
+
   /* Preferred alignment of the end of stack frame.  */
   unsigned int preferred_stack_boundary;
 
@@ -509,6 +519,38 @@ struct function GTY(())
 
   /* Nonzero if pass_tree_profile was run on this function.  */
   unsigned int after_tree_profile : 1;
+
+/* Nonzero if current function must be given a frame pointer.
+   Set in global.c if anything is allocated on the stack there.  */
+  unsigned int need_frame_pointer : 1;
+
+  /* Nonzero if need_frame_pointer has been set.  */
+  unsigned int need_frame_pointer_set : 1;
+
+  /* Nonzero if, by estimation, current function stack needs
realignment. */
+  unsigned int stack_realign_needed : 1;
+
+  /* Nonzero if function stack realignment is really needed. This flag
+     will be set after reload if by then criteria of stack realignment
+     is still true. Its value may be contridition to
stack_realign_needed
+     since the latter was set before reload. This flag is more accurate
+     than stack_realign_needed so prologue/epilogue should be generated
+     according to both flags  */
+  unsigned int stack_realign_really : 1;
+
+  /* Nonzero if function being compiled needs dynamic realigned
+     argument pointer (drap) if stack needs realigning.  */
+  unsigned int need_drap : 1;
+
+  /* Nonzero if current function needs to save/restore parameter
+     pointer register in prolog, because it is a callee save reg.  */
+  unsigned int save_param_ptr_reg : 1;
+
+  /* Nonzero if function stack realignment estimatoin is done.  */
+  unsigned int stack_realign_processed : 1;
+
+  /* Nonzero if function stack realignment has been finalized.  */
+  unsigned int stack_realign_finalized : 1;


As I've mentioned originally, it would be nice to place all the
variables and flags computed only at expansion time or later to
rtl_data.  I guess it covers majority of the above variables.

+#define frame_pointer_needed (cfun->need_frame_pointer)

It might be better to stick with the x_frame_pointer_needed accestor
scheme or simply replace it in sources with crtl->frame_pointer_needed
(I would definitly preffer the second)

+      /* If popup is needed, stack realign must use DRAP  */
+      if (MAX_VECTORIZE_STACK_ALIGNMENT && !cfun->need_drap)
+        cfun->need_drap = true;

No need to test !cfun->need_drap.

@@ -743,6 +760,29 @@ defer_stack_allocation (tree var, bool t
 static HOST_WIDE_INT
 expand_one_var (tree var, bool toplevel, bool really_expand)
 {
+  if (MAX_VECTORIZE_STACK_ALIGNMENT && TREE_CODE (var) == VAR_DECL)
+    {
+      unsigned int align;
+
+      /* Because we don't know if VAR will be in register or on stack,
+	 we conservatively assume it will be on stack even if VAR is
+	 eventually put into register after RA pass.  For non-automatic
+	 variables, which won't be on stack, we collect alignment of
+	 type and ignore user specified alignment.  */
+      if (TREE_STATIC (var) || DECL_EXTERNAL (var))
+	align = TYPE_ALIGN (TREE_TYPE (var));
+      else
+	align = DECL_ALIGN (var);
+
+      if (cfun->stack_alignment_estimated < align)
+        {
+          /* stack_alignment_estimated shouldn't change after stack
+             realign decision made */
+          gcc_assert(!cfun->stack_realign_processed);
+	  cfun->stack_alignment_estimated = align;
+	}
+    }

It seems a bit overzelaous to track the alignment everywhere.  The
variables ends up either on stack or in pseudo.  In both cases the
stack_alignment_estimated shold be bumped already?
+/* This pass sets crtl->args.internal_arg_pointer to a virtual
+   register if DRAP is needed.  Local register allocator will replace
+   virtual_incoming_args_rtx with the virtual register.  */
+
+static unsigned int
+handle_drap (void)
+{
+  rtx internal_arg_rtx; 
+
+  if (!cfun->need_drap
+      && (current_function_calls_alloca
+          || cfun->has_nonlocal_label
+          || current_function_has_nonlocal_goto))
+    cfun->need_drap = true;
+
+  /* Call targetm.calls.internal_arg_pointer again.  This time it will
+     return a virtual register if DRAP is needed.  */
+  internal_arg_rtx = targetm.calls.internal_arg_pointer (); 
+
+  /* Assertion to check internal_arg_pointer is set to the right rtx
+     here.  */
+  gcc_assert (crtl->args.internal_arg_pointer == 
+             virtual_incoming_args_rtx);
+
+  /* Do nothing if no need to replace virtual_incoming_args_rtx.  */
+  if (crtl->args.internal_arg_pointer != internal_arg_rtx)
+    {
+      crtl->args.internal_arg_pointer = internal_arg_rtx;
+
+      /* Call fixup_tail_casss to clean up REG_EQUIV note if DRAP is
+         needed. */
+      fixup_tail_calls ();
+    }
+
+  return 0;
+}
+
+struct gimple_opt_pass pass_handle_drap =
+{
+ {
+  GIMPLE_PASS,

This should be RTL_PASS, but in general it seems like other stuff in we
do during cfgexpand at the end of function body expansion, so it
probably more naturally belongs there than to extra pass.

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]