This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [stack] Automatic Stack Aligment Support


Updated patch:

2008-03-19  Joey Ye  <joey.ye@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>
	    Xuepeng Guo  <xuepeng.guo@intel.com>

	* builtins.c (expand_builtin_setjmp_receiver): Replace
	virtual_incoming_args_rtx with
	current_function_internal_arg_pointer.
	(expand_builtin_apply_args_1): Likewise.

	* calls.c (expand_call): Don't calculate preferred stack
	boundary according to incoming stack boundary. Replace 
	virtual_incoming_args_rtx with
	current_function_internal_arg_pointer.

	* cfgexpand.c (get_decl_align_unit): Estimate stack variable
	alignment and store to stack_alignment_estimated.
	(expand_one_var): Likewise.
	(gate_stack_realign): Gate new pass
pass_collect_stackrealign_info
	and pass_handle_drap.
	(collect_stackrealign_info): Execute new pass
	pass_collect_stackrealign_info.
	(pass_collect_stackrealign_info): Define new pass.
	(handle_drap): Execute new pass pass_handle_drap.
	(pass_handle_drap): Define new pass.

	* defaults.h (MAX_VECTORIZE_STACK_ALIGNMENT): New.

	* dojump.c (clear_pending_stack_adjust): Leave an FIXME in
	comments in case pending stack ajustment is discard when stack 
	realign is needed.

	* flags.h (frame_pointer_needed): Removed.
	* final.c (frame_pointer_needed): Likewise.

	* function.c (assign_stack_local_1): Estimate stack variable 
	alignment and store to stack_alignment_estimated.
	(instantiate_new_reg): Instantiate virtual incoming args rtx to
	vDRAP if stack realignment and DRAP is needed.
	(assign_parms): Collect parameter/return type alignment and 
	contribute to stack_alignment_estimated.
	(locate_and_pad_parm): Likewise.
	(allocate_struct_function): Init stack_alignment_estimated.
	(get_arg_pointer_save_area): Replace virtual_incoming_args_rtx
	with current_function_internal_arg_pointer.

	* function.h (function): Add drap_reg,
stack_alignment_estimated,
	need_frame_pointer, need_frame_pointer_set,
stack_realign_needed,
	stack_realign_really, need_drap, save_param_ptr_reg,
	stack_realign_processed, and stack_realign_finalized.
	(frame_pointer_needed): New.
	(stack_realign_fp): Likewise.
	(stack_realign_drap): Likewise.

	* global.c (compute_regsets): Set frame_pointer_needed
cannot_elim
	wrt stack_realign_needed.

	* stmt.c (expand_nl_goto_receiver): Replace 
	virtual_incoming_args_rtx with
	current_function_internal_arg_pointer.

	* passes.c (pass_collect_stackrealign_info): Insert this new
pass
	immediately before expand.
	(pass_handle_drap): Insert this new pass immediately after
expand.

	* tree-inline.c (expand_call_inline): Estimate stack variable
	alignment and store to stack_alignment_estimated.

	* tree-pass.h (pass_handle_drap): New.
	(pass_collect_stackrealign_info): Likewise.

	* tree-vectorizer.c (vect_can_force_dr_alignment_p): Estimate
	stack variable alignment and store to stack_alignment_estimated.

	* reload1.c (set_label_offsets): Assert that frame pointer must
be
	elimiated to stack pointer in case stack realignment is
estimated
	to happen without DRAP.
	(elimination_effects): Likewise.
	(eliminate_regs_in_insn): Likewise.
	(mark_not_eliminable): Likewise.
	(update_eliminables): Frame pointer is needed in case of stack
	realignment needed.
	(init_elim_table): Don't set frame_pointer_needed here.

	* config/i386/i386.c (ix86_force_align_arg_pointer_string):
Break
	long line.
	(ix86_user_incoming_stack_boundary): New.
	(ix86_default_incoming_stack_boundary): Likewise.
	(ix86_incoming_stack_boundary): Likewise.
	(find_drap_reg): Likewise.
	(override_options): Overide option value for new options.
	(ix86_function_ok_for_sibcall): Sibcall is OK even stack need
	realigning.
	(ix86_handle_cconv_attribute): Stack realign no longer impacts
	number of regparm.
	(ix86_function_regparm): Likewise.
	(setup_incoming_varargs_64): Remove the logic to set
	stack_alignment_needed here.
	(ix86_va_start): Replace virtual_incoming_args_rtx with
	current_function_internal_arg_pointer.
	(ix86_save_reg): Replace force_align_arg_pointer with drap_reg.
	(ix86_compute_frame_layout): Compute frame layout wrt stack
	realignment.
	(ix86_internal_arg_pointer): Estimate if stack realignment is
	needed and returns appropriate arg pointer rtx accordingly.
	(ix86_expand_prologue): Finally decide if stack realignment
	is needed and generate prologue code accordingly.
	(ix86_expand_epilogue): Generate epilogue code wrt stack
	realignment is really needed or not.

	* config/i386/i386.h (MAIN_STACK_BOUNDARY): New.
	(ABI_STACK_BOUNDARY): Likewise.
	PREFERRED_STACK_BOUNDARY_DEFAULT): Likewise.
	(STACK_REALIGN_DEFAULT): Likewise.
	(INCOMING_STACK_BOUNDARY): Likewise.
	(MAX_VECTORIZE_STACK_ALIGNMENT): Likewise.
	(ix86_incoming_stack_boundary): Likewise.
	(REAL_PIC_OFFSET_TABLE_REGNUM): Updated to use BX_REG.
	(CAN_ELIMINATE): Redefine the macro to eliminate frame pointer
to
	stack pointer and arg pointer to hard frame pointer in case of
	stack realignment without DRAP.
	(machine_function): Remove force_align_arg_pointer.

	* config/i386/i386.md (BX_REG): New.
	(R13_REG): Likewise.

	* config/i386/i386.opt (mforce_drap): New.
	(mincoming-stack-boundary): Likewise.
	(mstackrealign): Updated.

	* doc/extend.texi: Update force_align_arg_pointer.
	* doc/invoke.texi: Document -mincoming-stack-boundary.  Update
	-mstackrealign.


Index: flags.h
===================================================================
--- flags.h	(revision 133266)
+++ flags.h	(working copy)
@@ -223,12 +223,6 @@
 
 /* Other basic status info about current function.  */
 
-/* Nonzero means current function must be given a frame pointer.
-   Set in stmt.c if anything is allocated on the stack there.
-   Set in reload1.c if anything is allocated on the stack there.  */
-
-extern int frame_pointer_needed;
-
 /* Nonzero if subexpressions must be evaluated from left-to-right.  */
 extern int flag_evaluation_order;
 
Index: defaults.h
===================================================================
--- defaults.h	(revision 133266)
+++ defaults.h	(working copy)
@@ -940,4 +940,8 @@
 #define OUTGOING_REG_PARM_STACK_SPACE 0
 #endif
 
+#ifndef MAX_VECTORIZE_STACK_ALIGNMENT
+#define MAX_VECTORIZE_STACK_ALIGNMENT 0
+#endif
+
 #endif  /* ! GCC_DEFAULTS_H */
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 133266)
+++ tree-pass.h	(working copy)
@@ -314,6 +314,7 @@
 extern struct tree_opt_pass pass_mark_used_blocks;
 extern struct tree_opt_pass pass_rename_ssa_copies;
 extern struct tree_opt_pass pass_expand;
+extern struct tree_opt_pass pass_handle_drap;
 extern struct tree_opt_pass pass_rest_of_compilation;
 extern struct tree_opt_pass pass_sink_code;
 extern struct tree_opt_pass pass_fre;
@@ -450,6 +451,7 @@
 extern struct tree_opt_pass pass_apply_inline;
 extern struct tree_opt_pass pass_all_early_optimizations;
 extern struct tree_opt_pass pass_update_address_taken;
+extern struct tree_opt_pass pass_collect_stackrealign_info;
 
 /* The root of the compilation pass tree, once constructed.  */
 extern struct tree_opt_pass *all_passes, *all_ipa_passes,
*all_lowering_passes;
Index: final.c
===================================================================
--- final.c	(revision 133266)
+++ final.c	(working copy)
@@ -178,12 +178,6 @@
 CC_STATUS cc_prev_status;
 #endif
 
-/* Nonzero means current function must be given a frame pointer.
-   Initialized in function.c to 0.  Set only in reload1.c as per
-   the needs of the function.  */
-
-int frame_pointer_needed;
-
 /* Number of unmatched NOTE_INSN_BLOCK_BEG notes we have seen.  */
 
 static int block_depth;
Index: builtins.c
===================================================================
--- builtins.c	(revision 133266)
+++ builtins.c	(working copy)
@@ -740,7 +740,7 @@
 	{
 	  /* Now restore our arg pointer from the address at which it
 	     was saved in our stack frame.  */
-	  emit_move_insn (virtual_incoming_args_rtx,
+	  emit_move_insn (current_function_internal_arg_pointer,
 			  copy_to_reg (get_arg_pointer_save_area
(cfun)));
 	}
     }
@@ -1345,7 +1345,7 @@
       }
 
   /* Save the arg pointer to the block.  */
-  tem = copy_to_reg (virtual_incoming_args_rtx);
+  tem = copy_to_reg (current_function_internal_arg_pointer);
 #ifdef STACK_GROWS_DOWNWARD
   /* We need the pointer as the caller actually passed them to us, not
      as we might have pretended they were passed.  Make sure it's a
valid
Index: dojump.c
===================================================================
--- dojump.c	(revision 133266)
+++ dojump.c	(working copy)
@@ -64,8 +64,11 @@
    so the adjustment won't get done.
 
    Note, if the current function calls alloca, then it must have a
-   frame pointer regardless of the value of flag_omit_frame_pointer.
*/
+   frame pointer regardless of the value of flag_omit_frame_pointer.  
 
+   When stack realign is needed, we can't discard pending stack
adjustment,
+   in which stack pointer must be restored in epilogue. */
+
 void
 clear_pending_stack_adjust (void)
 {
Index: global.c
===================================================================
--- global.c	(revision 133266)
+++ global.c	(working copy)
@@ -247,11 +247,21 @@
   static const struct {const int from, to; } eliminables[] =
ELIMINABLE_REGS;
   size_t i;
 #endif
+
+  /* FIXME: If EXIT_IGNORE_STACK is set, we will not save and restore
+     sp for alloca.  So we can't eliminate the frame pointer in that
+     case.  At some point, we should improve this by emitting the
+     sp-adjusting insns for this case.  */
   int need_fp
     = (! flag_omit_frame_pointer
        || (current_function_calls_alloca && EXIT_IGNORE_STACK)
-       || FRAME_POINTER_REQUIRED);
+       || FRAME_POINTER_REQUIRED
+       || current_function_accesses_prior_frames
+       || cfun->stack_realign_needed);
 
+  frame_pointer_needed = need_fp;
+  cfun->need_frame_pointer_set = 1;
+
   max_regno = max_reg_num ();
   compact_blocks ();
 
@@ -271,7 +281,10 @@
     {
       bool cannot_elim
 	= (! CAN_ELIMINATE (eliminables[i].from, eliminables[i].to)
-	   || (eliminables[i].to == STACK_POINTER_REGNUM && need_fp));
+	   || (eliminables[i].to == STACK_POINTER_REGNUM
+	       && need_fp 
+	       && (! MAX_VECTORIZE_STACK_ALIGNMENT
+		   || ! stack_realign_fp)));
 
       if (!regs_asm_clobbered[eliminables[i].from])
 	{
Index: function.c
===================================================================
--- function.c	(revision 133266)
+++ function.c	(working copy)
@@ -403,17 +403,19 @@
 {
   rtx x, addr;
   int bigend_correction = 0;
-  unsigned int alignment;
+  unsigned int alignment, mode_alignment;
   int frame_off, frame_alignment, frame_phase;
 
+  if (mode == BLKmode)
+    mode_alignment = BIGGEST_ALIGNMENT;
+  else
+    mode_alignment = GET_MODE_ALIGNMENT (mode);
+
   if (align == 0)
     {
       tree type;
 
-      if (mode == BLKmode)
-	alignment = BIGGEST_ALIGNMENT;
-      else
-	alignment = GET_MODE_ALIGNMENT (mode);
+      alignment = mode_alignment;
 
       /* Allow the target to (possibly) increase the alignment of this
 	 stack slot.  */
@@ -436,10 +438,37 @@
   if (FRAME_GROWS_DOWNWARD)
     function->x_frame_offset -= size;
 
-  /* Ignore alignment we can't do with expected alignment of the
boundary.  */
-  if (alignment * BITS_PER_UNIT > PREFERRED_STACK_BOUNDARY)
-    alignment = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
-
+  if (MAX_VECTORIZE_STACK_ALIGNMENT)
+    {
+      if (function->stack_alignment_estimated < alignment *
BITS_PER_UNIT)
+	{
+          if (!function->stack_realign_processed)
+            function->stack_alignment_estimated
+	      = alignment * BITS_PER_UNIT;
+          else
+	    {
+	      gcc_assert (!function->stack_realign_finalized);
+	      if (!function->stack_realign_needed)
+		{
+		  /* It is OK to reduce the alignment as long as the
+		     requested size is 0 or the estimated stack
+		     alignment >= mode alignment.  */
+		  gcc_assert (size == 0
+			      || (function->stack_alignment_estimated
+				  >= mode_alignment));
+		  alignment = (function->stack_alignment_estimated
+			       / BITS_PER_UNIT);
+		}
+	    }
+	}
+    }
+  else
+    {
+      /* Ignore alignment we can't do with expected alignment of the
+	 boundary.  */
+      if (alignment * BITS_PER_UNIT > PREFERRED_STACK_BOUNDARY)
+	alignment = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
+    }
   if (function->stack_alignment_needed < alignment * BITS_PER_UNIT)
     function->stack_alignment_needed = alignment * BITS_PER_UNIT;
 
@@ -1240,7 +1269,17 @@
   HOST_WIDE_INT offset;
 
   if (x == virtual_incoming_args_rtx)
-    new = arg_pointer_rtx, offset = in_arg_offset;
+    {
+      /* Replace vitural_incoming_args_rtx to internal arg pointer here
*/
+      if (current_function_internal_arg_pointer !=
virtual_incoming_args_rtx)
+        {
+          gcc_assert (stack_realign_drap);
+          new = current_function_internal_arg_pointer;
+          offset = 0;
+        }
+      else
+        new = arg_pointer_rtx, offset = in_arg_offset;
+    }
   else if (x == virtual_stack_vars_rtx)
     new = frame_pointer_rtx, offset = var_offset;
   else if (x == virtual_stack_dynamic_rtx)
@@ -3037,6 +3076,20 @@
 	  continue;
 	}
 
+      /* Estimate stack alignment from parameter alignment */
+      if (MAX_VECTORIZE_STACK_ALIGNMENT)
+        {
+          unsigned int align = FUNCTION_ARG_BOUNDARY
(data.promoted_mode,
+						      data.passed_type);
+	  if (TYPE_ALIGN (data.nominal_type) > align)
+	    align = TYPE_ALIGN (data.passed_type);
+	  if (cfun->stack_alignment_estimated < align)
+	    {
+	      gcc_assert (!cfun->stack_realign_processed);
+	      cfun->stack_alignment_estimated = align;
+	    }
+	}
+	
       if (current_function_stdarg && !TREE_CHAIN (parm))
 	assign_parms_setup_varargs (&all, &data, false);
 
@@ -3074,6 +3127,28 @@
      now that all parameters have been copied out of hard registers.
*/
   emit_insn (all.first_conversion_insn);
 
+  /* Estimate reload stack alignment from scalar return mode.  */
+  if (MAX_VECTORIZE_STACK_ALIGNMENT)
+    {
+      if (DECL_RESULT (fndecl))
+	{
+	  tree type = TREE_TYPE (DECL_RESULT (fndecl));
+	  enum machine_mode mode = TYPE_MODE (type);
+
+	  if (mode != BLKmode
+	      && mode != VOIDmode
+	      && !AGGREGATE_TYPE_P (type))
+	    {
+	      unsigned int align = GET_MODE_ALIGNMENT (mode);
+	      if (cfun->stack_alignment_estimated < align)
+		{
+		  gcc_assert (!cfun->stack_realign_processed);
+		  cfun->stack_alignment_estimated = align;
+		}
+	    }
+	} 
+    }
+
   /* If we are receiving a struct value address as the first argument,
set up
      the RTL for the function result. As this might require code to
convert
      the transmitted address to Pmode, we do this here to ensure that
possible
@@ -3351,10 +3426,28 @@
   locate->where_pad = where_pad;
   locate->boundary = boundary;
 
-  /* Remember if the outgoing parameter requires extra alignment on the
-     calling function side.  */
-  if (boundary > PREFERRED_STACK_BOUNDARY)
-    boundary = PREFERRED_STACK_BOUNDARY;
+  if (MAX_VECTORIZE_STACK_ALIGNMENT)
+    {
+      /* stack_alignment_estimated can't change after stack has been
+	 realigned.  */
+      if (cfun->stack_alignment_estimated < boundary)
+        {
+          if (!cfun->stack_realign_processed)
+	    cfun->stack_alignment_estimated = boundary;
+	  else
+	    {
+	      gcc_assert (!cfun->stack_realign_finalized
+			  && cfun->stack_realign_needed);
+	    }
+	}
+    }
+  else
+    {
+      /* Remember if the outgoing parameter requires extra alignment on
+         the calling function side.  */
+      if (boundary > PREFERRED_STACK_BOUNDARY)
+        boundary = PREFERRED_STACK_BOUNDARY;
+    }
   if (cfun->stack_alignment_needed < boundary)
     cfun->stack_alignment_needed = boundary;
 
@@ -3912,6 +4005,7 @@
   cfun = ggc_alloc_cleared (sizeof (struct function));
 
   cfun->stack_alignment_needed = STACK_BOUNDARY;
+  cfun->stack_alignment_estimated = STACK_BOUNDARY;
   cfun->preferred_stack_boundary = STACK_BOUNDARY;
 
   current_function_funcdef_no = get_next_funcdef_no ();
@@ -4687,7 +4781,8 @@
 	 generated stack slot may not be a valid memory address, so we
 	 have to check it and fix it if necessary.  */
       start_sequence ();
-      emit_move_insn (validize_mem (ret), virtual_incoming_args_rtx);
+      emit_move_insn (validize_mem (ret),
+                      current_function_internal_arg_pointer);
       seq = get_insns ();
       end_sequence ();
 
Index: tree-vectorizer.c
===================================================================
--- tree-vectorizer.c	(revision 133266)
+++ tree-vectorizer.c	(working copy)
@@ -1786,9 +1786,19 @@
 
   if (TREE_STATIC (decl))
     return (alignment <= MAX_OFILE_ALIGNMENT);
+  else if (MAX_VECTORIZE_STACK_ALIGNMENT)
+    {
+      gcc_assert (!cfun->stack_realign_processed);
+      if (alignment <= MAX_VECTORIZE_STACK_ALIGNMENT)
+	{
+	  if (cfun->stack_alignment_estimated < alignment)
+	    cfun->stack_alignment_estimated = alignment;
+	  return true;
+	}
+      else
+	return false;
+    }
   else
-    /* This used to be PREFERRED_STACK_BOUNDARY, however, that is not
100%
-       correct until someone implements forced stack alignment.  */
     return (alignment <= STACK_BOUNDARY); 
 }
 
Index: function.h
===================================================================
--- function.h	(revision 133266)
+++ function.h	(working copy)
@@ -255,6 +255,9 @@
      needed by inner routines.  */
   rtx x_arg_pointer_save_area;
 
+  /* Dynamic Realign Argument Pointer used for realigning stack.  */
+  rtx drap_reg;
+
   /* Offset to end of allocated area of stack frame.
      If stack grows down, this is the address of the last stack slot
allocated.
      If stack grows up, this is the address for the next slot.  */
@@ -295,6 +298,9 @@
   /* The largest alignment of slot allocated on the stack.  */
   unsigned int stack_alignment_needed;
 
+  /* The estimated stack alignment.  */
+  unsigned int stack_alignment_estimated;
+
   /* Preferred alignment of the end of stack frame.  */
   unsigned int preferred_stack_boundary;
 
@@ -463,6 +469,38 @@
 
   /* Nonzero if pass_tree_profile was run on this function.  */
   unsigned int after_tree_profile : 1;
+
+/* Nonzero if current function must be given a frame pointer.
+   Set in global.c if anything is allocated on the stack there.  */
+  unsigned int need_frame_pointer : 1;
+
+  /* Nonzero if need_frame_pointer has been set.  */
+  unsigned int need_frame_pointer_set : 1;
+
+  /* Nonzero if, by estimation, current function stack needs
realignment. */
+  unsigned int stack_realign_needed : 1;
+
+  /* Nonzero if function stack realignment is really needed. This flag
+     will be set after reload if by then criteria of stack realignment
+     is still true. Its value may be contridition to
stack_realign_needed
+     since the latter was set before reload. This flag is more accurate
+     than stack_realign_needed so prologue/epilogue should be generated
+     according to both flags  */
+  unsigned int stack_realign_really : 1;
+
+  /* Nonzero if function being compiled needs dynamic realigned
+     argument pointer (drap) if stack needs realigning.  */
+  unsigned int need_drap : 1;
+
+  /* Nonzero if current function needs to save/restore parameter
+     pointer register in prolog, because it is a callee save reg.  */
+  unsigned int save_param_ptr_reg : 1;
+
+  /* Nonzero if function stack realignment estimatoin is done.  */
+  unsigned int stack_realign_processed : 1;
+
+  /* Nonzero if function stack realignment has been finalized.  */
+  unsigned int stack_realign_finalized : 1;
 };
 
 /* If va_list_[gf]pr_size is set to this, it means we don't know how
@@ -537,6 +575,9 @@
 #define dom_computed (cfun->cfg->x_dom_computed)
 #define n_bbs_in_dom_tree (cfun->cfg->x_n_bbs_in_dom_tree)
 #define VALUE_HISTOGRAMS(fun) (fun)->value_histograms
+#define frame_pointer_needed (cfun->need_frame_pointer)
+#define stack_realign_fp (cfun->stack_realign_needed &&
!cfun->need_drap)
+#define stack_realign_drap (cfun->stack_realign_needed &&
cfun->need_drap)
 
 /* Given a function decl for a containing function,
    return the `struct function' for it.  */
Index: calls.c
===================================================================
--- calls.c	(revision 133266)
+++ calls.c	(working copy)
@@ -2099,7 +2099,10 @@
 
   /* Figure out the amount to which the stack should be aligned.  */
   preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
-  if (fndecl)
+
+  /* With automatic stack realignment, we align stack in prologue when
+     needed and there is no need to update preferred_stack_boundary.
*/
+  if (!MAX_VECTORIZE_STACK_ALIGNMENT && fndecl)
     {
       struct cgraph_rtl_info *i = cgraph_rtl_info (fndecl);
       if (i && i->preferred_incoming_stack_boundary)
@@ -2401,7 +2404,7 @@
 	 incoming argument block.  */
       if (pass == 0)
 	{
-	  argblock = virtual_incoming_args_rtx;
+	  argblock = current_function_internal_arg_pointer;
 	  argblock
 #ifdef STACK_GROWS_DOWNWARD
 	    = plus_constant (argblock,
current_function_pretend_args_size);
Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 133266)
+++ cfgexpand.c	(working copy)
@@ -161,8 +161,23 @@
 
   align = DECL_ALIGN (decl);
   align = LOCAL_ALIGNMENT (TREE_TYPE (decl), align);
-  if (align > PREFERRED_STACK_BOUNDARY)
-    align = PREFERRED_STACK_BOUNDARY;
+
+  if (MAX_VECTORIZE_STACK_ALIGNMENT)
+    {
+      if (cfun->stack_alignment_estimated < align)
+	{
+	  gcc_assert(!cfun->stack_realign_processed);
+          cfun->stack_alignment_estimated = align;
+	}
+    }
+  else
+    {
+      if (align > PREFERRED_STACK_BOUNDARY)
+	align = PREFERRED_STACK_BOUNDARY;
+    }
+
+  /* stack_alignment_needed > PREFERRED_STACK_BOUNDARY is permitted.
+     So here we only make sure stack_alignment_needed >= align.  */
   if (cfun->stack_alignment_needed < align)
     cfun->stack_alignment_needed = align;
 
@@ -748,6 +763,29 @@
 static HOST_WIDE_INT
 expand_one_var (tree var, bool toplevel, bool really_expand)
 {
+  if (MAX_VECTORIZE_STACK_ALIGNMENT && TREE_CODE (var) == VAR_DECL)
+    {
+      unsigned int align;
+
+      /* Because we don't know if VAR will be in register or on stack,
+	 we conservatively assume it will be on stack even if VAR is
+	 eventually put into register after RA pass.  For non-automatic
+	 variables, which won't be on stack, we collect alignment of
+	 type and ignore user specified alignment.  */
+      if (TREE_STATIC (var) || DECL_EXTERNAL (var))
+	align = TYPE_ALIGN (TREE_TYPE (var));
+      else
+	align = DECL_ALIGN (var);
+
+      if (cfun->stack_alignment_estimated < align)
+        {
+          /* stack_alignment_estimated shouldn't change after stack
+             realign decision made */
+          gcc_assert(!cfun->stack_realign_processed);
+	  cfun->stack_alignment_estimated = align;
+	}
+    }
+
   if (TREE_CODE (var) != VAR_DECL)
     {
       if (really_expand)
@@ -2003,3 +2041,131 @@
   TODO_dump_func,                       /* todo_flags_finish */
   'r'					/* letter */
 };
+
+static bool
+gate_stack_realign (void)
+{
+  if (!MAX_VECTORIZE_STACK_ALIGNMENT)
+    return false;
+  else
+    {
+      gcc_assert (!cfun->stack_realign_processed);
+      return true;
+    }
+}
+
+/* Collect accurate info for stack realign.  */
+
+static unsigned int
+collect_stackrealign_info (void)
+{
+  basic_block bb;
+  block_stmt_iterator bsi;
+
+  if (cfun->has_nonlocal_label)
+    cfun->need_drap = true;
+
+  FOR_EACH_BB (bb)
+    for (bsi = bsi_start (bb); ! bsi_end_p (bsi); bsi_next (&bsi))
+      {
+	tree stmt = bsi_stmt (bsi);
+	tree call = get_call_expr_in (stmt);
+	tree decl, type;
+	int flags;
+
+	if (!call)
+	  continue;
+
+	flags = call_expr_flags (call);
+	if (flags & ECF_MAY_BE_ALLOCA)
+	  cfun->need_drap = true;
+
+	decl = get_callee_fndecl (call);
+	if (decl && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL)
+	  switch (DECL_FUNCTION_CODE (decl))
+	    {
+	    case BUILT_IN_NONLOCAL_GOTO:
+	    case BUILT_IN_APPLY:
+	    case BUILT_IN_LONGJMP:
+	      cfun->need_drap = true;
+	      break;
+	    default:
+	      break;
+	    }
+
+	type = TREE_TYPE (call);
+	if (!type || VOID_TYPE_P (type))
+          continue;
+
+	/* FIXME: Do we need DRAP when the result is returned on
+	   stack?  */
+	if (aggregate_value_p (type, decl))
+	  cfun->need_drap = true;
+      }  
+
+  return 0;
+}
+
+struct tree_opt_pass pass_collect_stackrealign_info =
+{   
+  "stack_realign_info",                 /* name */
+  gate_stack_realign,                   /* gate */
+  collect_stackrealign_info,            /* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_numbler */
+  0,                                    /* tv_id */
+  0,                                    /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  0,                                    /* todo_flags_finish */
+  0                                     /* letter */
+};
+
+/* New pass handle_drap. 
+   This pass first checks if DRAP is needed.
+   If yes, it will set current_function_internal_arg_pointer to that
+   virtual register. Later lregs pass will replace
+   virtual_incoming_args_rtx to that virtual reg */
+static unsigned int
+handle_drap (void)
+{
+  /* Call targetm.calls.internal_arg_pointer again. This time it will
+     return a virtual reg if DRAP is needed */
+  rtx internal_arg_rtx = targetm.calls.internal_arg_pointer (); 
+
+  /* Assertion to check internal_arg_pointer is set to the right rtx
here */
+  gcc_assert (current_function_internal_arg_pointer == 
+             virtual_incoming_args_rtx);
+
+  /* Do nothing if needn't replace virtual incoming arg rtx */
+  if (current_function_internal_arg_pointer != internal_arg_rtx)
+    {
+      current_function_internal_arg_pointer = internal_arg_rtx;
+
+      /* Call fixup_tail_casss to clean up REG_EQUIV note 
+         if DRAP is needed. */
+      fixup_tail_calls ();
+    }
+
+  return 0;
+}
+
+struct tree_opt_pass pass_handle_drap =
+{
+  "handle_drap",			/* name */
+  gate_stack_realign,                   /* gate */
+  handle_drap,			        /* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  0,				        /* tv_id */
+  /* ??? If TER is enabled, we actually receive GENERIC.  */
+  0,                                    /* properties_required */
+  PROP_rtl,                             /* properties_provided */
+  0,				        /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  TODO_dump_func,                       /* todo_flags_finish */
+  0					/* letter */
+};
Index: tree-inline.c
===================================================================
--- tree-inline.c	(revision 133266)
+++ tree-inline.c	(working copy)
@@ -2840,8 +2840,26 @@
 	cfun->unexpanded_var_list = tree_cons (NULL_TREE, var,
 
cfun->unexpanded_var_list);
       else
-	cfun->unexpanded_var_list = tree_cons (NULL_TREE, remap_decl
(var, id),
-
cfun->unexpanded_var_list);
+	{
+	  /* Update stack alignment requirement if needed.  */
+	  if (MAX_VECTORIZE_STACK_ALIGNMENT)
+	    {
+	      unsigned int align;
+
+	      if (TREE_STATIC (var) || DECL_EXTERNAL (var))
+		align = TYPE_ALIGN (TREE_TYPE (var));
+	      else
+		align = DECL_ALIGN (var);
+	      if (align  > cfun->stack_alignment_estimated)
+		{
+		  gcc_assert(!cfun->stack_realign_processed);
+		  cfun->stack_alignment_estimated = align;
+		}
+	    }
+	  cfun->unexpanded_var_list
+	    = tree_cons (NULL_TREE, remap_decl (var, id),
+			 cfun->unexpanded_var_list);
+	}
     }
 
   /* Clean up.  */
Index: passes.c
===================================================================
--- passes.c	(revision 133266)
+++ passes.c	(working copy)
@@ -682,7 +682,9 @@
   NEXT_PASS (pass_free_datastructures);
   NEXT_PASS (pass_mudflap_2);
   NEXT_PASS (pass_free_cfg_annotations);
+  NEXT_PASS (pass_collect_stackrealign_info);
   NEXT_PASS (pass_expand);
+  NEXT_PASS (pass_handle_drap); 
   NEXT_PASS (pass_rest_of_compilation);
     {
       struct tree_opt_pass **p = &pass_rest_of_compilation.sub;
Index: config/i386/i386.h
===================================================================
--- config/i386/i386.h	(revision 133266)
+++ config/i386/i386.h	(working copy)
@@ -800,17 +800,33 @@
 /* Boundary (in *bits*) on which stack pointer should be aligned.  */
 #define STACK_BOUNDARY BITS_PER_WORD
 
+/* Stack boundary of the main function guaranteed by OS.  */
+#define MAIN_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32)
+
+/* Stack boundary guaranteed by ABI.  */
+#define ABI_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32)
+
 /* Boundary (in *bits*) on which the stack pointer prefers to be
    aligned; the compiler cannot rely on having this alignment.  */
 #define PREFERRED_STACK_BOUNDARY ix86_preferred_stack_boundary
 
-/* As of July 2001, many runtimes do not align the stack properly when
-   entering main.  This causes expand_main_function to forcibly align
-   the stack, which results in aligned frames for functions called from
-   main, though it does nothing for the alignment of main itself.  */
-#define FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN \
-  (ix86_preferred_stack_boundary > STACK_BOUNDARY && !TARGET_64BIT)
+/* It should be ABI_STACK_BOUNDARY.  But we set it to 128 bits for
+   both 32bit and 64bit, to support codes that need 128 bit stack
+   alignment for SSE instructions, but can't realign the stack.  */
+#define PREFERRED_STACK_BOUNDARY_DEFAULT 128
 
+/* 1 if -mstackrealign should be turned on by default.  It will
+   generate an alternate prologue and epilogue that realigns the
+   runtime stack if nessary.  This supports mixing codes that keep a
+   4-byte aligned stack, as specified by i386 psABI, with codes that
+   need a 16-byte aligned stack, as required by SSE instructions.  If
+   STACK_REALIGN_DEFAULT is 1 and PREFERRED_STACK_BOUNDARY_DEFAULT is
+   128, stacks for all functions may be realigned.  */
+#define STACK_REALIGN_DEFAULT 0
+
+/* Boundary (in *bits*) on which the incoming stack is aligned.  */
+#define INCOMING_STACK_BOUNDARY ix86_incoming_stack_boundary
+
 /* Target OS keeps a vector-aligned (128-bit, 16-byte) stack.  This is
    mandatory for the 64-bit ABI, and may or may not be true for other
    operating systems.  */
@@ -836,6 +852,9 @@
 
 #define BIGGEST_ALIGNMENT 128
 
+/* Maximum stack alignment for vectorizer.  */
+#define MAX_VECTORIZE_STACK_ALIGNMENT BIGGEST_ALIGNMENT
+
 /* Decide whether a variable of mode MODE should be 128 bit aligned.
*/
 #define ALIGN_MODE_128(MODE) \
  ((MODE) == XFmode || SSE_REG_MODE_P (MODE))
@@ -1245,7 +1264,7 @@
    the pic register when possible.  The change is visible after the
    prologue has been emitted.  */
 
-#define REAL_PIC_OFFSET_TABLE_REGNUM  3
+#define REAL_PIC_OFFSET_TABLE_REGNUM  BX_REG
 
 #define PIC_OFFSET_TABLE_REGNUM				\
   ((TARGET_64BIT && ix86_cmodel == CM_SMALL_PIC)	\
@@ -1786,7 +1805,10 @@
    All other eliminations are valid.  */
 
 #define CAN_ELIMINATE(FROM, TO) \
-  ((TO) == STACK_POINTER_REGNUM ? !frame_pointer_needed : 1)
+  (stack_realign_fp \
+  ? ((FROM) == ARG_POINTER_REGNUM && (TO) == HARD_FRAME_POINTER_REGNUM)
\
+    || ((FROM) == FRAME_POINTER_REGNUM && (TO) == STACK_POINTER_REGNUM)
\
+  : ((TO) == STACK_POINTER_REGNUM ? !frame_pointer_needed : 1))
 
 /* Define the offset between two registers, one to be eliminated, and
the other
    its replacement, at the start of a routine.  */
@@ -2342,6 +2364,7 @@
 
 extern enum asm_dialect ix86_asm_dialect;
 extern unsigned int ix86_preferred_stack_boundary;
+extern unsigned int ix86_incoming_stack_boundary;
 extern int ix86_branch_cost, ix86_section_threshold;
 
 /* Smallest class containing REGNO.  */
@@ -2443,7 +2466,6 @@
 {
   struct stack_local_entry *stack_locals;
   const char *some_ld_name;
-  rtx force_align_arg_pointer;
   int save_varrargs_registers;
   int accesses_prev_frame;
   int optimize_mode_switching[MAX_386_ENTITIES];
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md	(revision 133266)
+++ config/i386/i386.md	(working copy)
@@ -221,6 +221,7 @@
   [(AX_REG			 0)
    (DX_REG			 1)
    (CX_REG			 2)
+   (BX_REG			 3)
    (SI_REG			 4)
    (DI_REG			 5)
    (BP_REG			 6)
@@ -230,6 +231,7 @@
    (FPCR_REG			19)
    (R10_REG			39)
    (R11_REG			40)
+   (R13_REG			42)
   ])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
Index: config/i386/i386.opt
===================================================================
--- config/i386/i386.opt	(revision 133266)
+++ config/i386/i386.opt	(working copy)
@@ -78,6 +78,10 @@
 Target RejectNegative Report InverseMask(NO_FANCY_MATH_387,
USE_FANCY_MATH_387)
 Generate sin, cos, sqrt for FPU
 
+mforce-drap
+Target Report Var(ix86_force_drap)
+Always use Dynamic Realigned Argument Pointer (DRAP) to realign stack.
+
 mfp-ret-in-387
 Target Report Mask(FLOAT_RETURNS)
 Return values of functions in FPU registers
@@ -134,6 +138,10 @@
 Target RejectNegative Joined Var(ix86_preferred_stack_boundary_string)
 Attempt to keep stack aligned to this power of 2
 
+mincoming-stack-boundary=
+Target RejectNegative Joined Var(ix86_incoming_stack_boundary_string)
+Assume incoming stack aligned to this power of 2
+
 mpush-args
 Target Report InverseMask(NO_PUSH_ARGS, PUSH_ARGS)
 Use push instructions to save outgoing arguments
@@ -159,7 +167,7 @@
 Use SSE register passing conventions for SF and DF mode
 
 mstackrealign
-Target Report Var(ix86_force_align_arg_pointer)
+Target Report Var(ix86_force_align_arg_pointer) Init(-1)
 Realign stack in prologue
 
 mstack-arg-probe
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 133266)
+++ config/i386/i386.c	(working copy)
@@ -1693,11 +1693,22 @@
 
 /* -mstackrealign option */
 extern int ix86_force_align_arg_pointer;
-static const char ix86_force_align_arg_pointer_string[] =
"force_align_arg_pointer";
+static const char ix86_force_align_arg_pointer_string[]
+  = "force_align_arg_pointer";
 
 /* Preferred alignment for stack boundary in bits.  */
 unsigned int ix86_preferred_stack_boundary;
 
+/* Alignment for incoming stack boundary in bits specified at
+   command line.  */
+static unsigned int ix86_user_incoming_stack_boundary;
+
+/* Default alignment for incoming stack boundary in bits.  */
+static unsigned int ix86_default_incoming_stack_boundary;
+
+/* Alignment for incoming stack boundary in bits.  */
+unsigned int ix86_incoming_stack_boundary;
+
 /* Values 1-5: see jump.c */
 int ix86_branch_cost;
 
@@ -2611,11 +2622,9 @@
   if (TARGET_SSE4_2 || TARGET_ABM)
     x86_popcnt = true;
 
-  /* Validate -mpreferred-stack-boundary= value, or provide default.
-     The default of 128 bits is for Pentium III's SSE __m128.  We can't
-     change it because of optimize_size.  Otherwise, we can't mix
object
-     files compiled with -Os and -On.  */
-  ix86_preferred_stack_boundary = 128;
+  /* Validate -mpreferred-stack-boundary= value or default it to
+     PREFERRED_STACK_BOUNDARY_DEFAULT.  */
+  ix86_preferred_stack_boundary = PREFERRED_STACK_BOUNDARY_DEFAULT;
   if (ix86_preferred_stack_boundary_string)
     {
       i = atoi (ix86_preferred_stack_boundary_string);
@@ -2626,6 +2635,31 @@
 	ix86_preferred_stack_boundary = (1 << i) * BITS_PER_UNIT;
     }
 
+  /* Set the default value for -mstackrealign.  */
+  if (ix86_force_align_arg_pointer == -1)
+    ix86_force_align_arg_pointer = STACK_REALIGN_DEFAULT;
+
+  /* Validate -mincoming-stack-boundary= value or default it to
+     ABI_STACK_BOUNDARY/PREFERRED_STACK_BOUNDARY.  */
+  if (ix86_force_align_arg_pointer)
+    ix86_default_incoming_stack_boundary = ABI_STACK_BOUNDARY;
+  else
+    ix86_default_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY;
+  ix86_incoming_stack_boundary = ix86_default_incoming_stack_boundary;
+  if (ix86_incoming_stack_boundary_string)
+    {
+      i = atoi (ix86_incoming_stack_boundary_string);
+      if (i < (TARGET_64BIT ? 4 : 2) || i > 12)
+	error ("-mincoming-stack-boundary=%d is not between %d and 12",
+	       i, TARGET_64BIT ? 4 : 2);
+      else
+	{
+	  ix86_user_incoming_stack_boundary = (1 << i) * BITS_PER_UNIT;
+	  ix86_incoming_stack_boundary
+	    = ix86_user_incoming_stack_boundary;
+	}
+    }
+
   /* Accept -msseregparm only if at least SSE support is enabled.  */
   if (TARGET_SSEREGPARM
       && ! TARGET_SSE)
@@ -3063,11 +3097,6 @@
       && ix86_function_regparm (TREE_TYPE (decl), NULL) >= 3)
     return false;
 
-  /* If we forced aligned the stack, then sibcalling would unalign the
-     stack, which may break the called function.  */
-  if (cfun->machine->force_align_arg_pointer)
-    return false;
-
   /* Otherwise okay.  That also includes certain types of indirect
calls.  */
   return true;
 }
@@ -3118,15 +3147,6 @@
 	  *no_add_attrs = true;
 	}
 
-      if (!TARGET_64BIT
-	  && lookup_attribute (ix86_force_align_arg_pointer_string,
-			       TYPE_ATTRIBUTES (*node))
-	  && compare_tree_int (cst, REGPARM_MAX-1))
-	{
-	  error ("%s functions limited to %d register parameters",
-		 ix86_force_align_arg_pointer_string, REGPARM_MAX-1);
-	}
-
       return NULL_TREE;
     }
 
@@ -3263,8 +3283,7 @@
 	  /* We can't use regparm(3) for nested functions as these use
 	     static chain pointer in third argument.  */
 	  if (local_regparm == 3
-	      && (decl_function_context (decl)
-                  || ix86_force_align_arg_pointer)
+	      && decl_function_context (decl)
 	      && !DECL_NO_STATIC_CHAIN (decl))
 	    local_regparm = 2;
 
@@ -3273,13 +3292,11 @@
 	     the callee DECL_STRUCT_FUNCTION is gone, so we fall back to
 	     scanning the attributes for the self-realigning property.
*/
 	  f = DECL_STRUCT_FUNCTION (decl);
-	  if (local_regparm == 3
-	      && (f ? !!f->machine->force_align_arg_pointer
-		  : !!lookup_attribute
(ix86_force_align_arg_pointer_string,
-					TYPE_ATTRIBUTES (TREE_TYPE
(decl)))))
-	    local_regparm = 2;
+          /* Since current internal arg pointer will won't conflict
+	     with parameter passing regs, so no need to change stack
+	     realignment and adjust regparm number.
 
-	  /* Each fixed register usage increases register pressure,
+	     Each fixed register usage increases register pressure,
 	     so less registers should be used for argument passing.
 	     This functionality can be overriden by an explicit
 	     regparm value.  */
@@ -4991,15 +5008,7 @@
 
   /* Indicate to allocate space on the stack for varargs save area.  */
   ix86_save_varrargs_registers = 1;
-  /* We need 16-byte stack alignment to save SSE registers.  If user
-     asked for lower preferred_stack_boundary, lets just hope that he
knows
-     what he is doing and won't varargs SSE values.
 
-     We also may end up assuming that only 64bit values are stored in
SSE
-     register let some floating point program work.  */
-  if (ix86_preferred_stack_boundary >= 128)
-    cfun->stack_alignment_needed = 128;
-
   save_area = frame_pointer_rtx;
   set = get_varargs_alias_set ();
 
@@ -5166,7 +5175,7 @@
 
   /* Find the overflow area.  */
   type = TREE_TYPE (ovf);
-  t = make_tree (type, virtual_incoming_args_rtx);
+  t = make_tree (type, current_function_internal_arg_pointer);
   if (words != 0)
     t = build2 (POINTER_PLUS_EXPR, type, t,
 	        size_int (words * UNITS_PER_WORD));
@@ -5963,8 +5972,8 @@
 	}
     }
 
-  if (cfun->machine->force_align_arg_pointer
-      && regno == REGNO (cfun->machine->force_align_arg_pointer))
+  if (cfun->drap_reg
+      && regno == REGNO (cfun->drap_reg))
     return 1;
 
   return (df_regs_ever_live_p (regno)
@@ -6030,6 +6039,9 @@
   stack_alignment_needed = cfun->stack_alignment_needed /
BITS_PER_UNIT;
   preferred_alignment = cfun->preferred_stack_boundary / BITS_PER_UNIT;
 
+  gcc_assert (!size || stack_alignment_needed);
+  gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
+
   /* During reload iteration the amount of registers saved can change.
      Recompute the value as needed.  Do not recompute when amount of
registers
      didn't change as reload does multiple calls to the function and
does not
@@ -6072,19 +6084,10 @@
 
   frame->hard_frame_pointer_offset = offset;
 
-  /* Do some sanity checking of stack_alignment_needed and
-     preferred_alignment, since i386 port is the only using those
features
-     that may break easily.  */
+  /* Set offset to aligned because the realigned frame tarts from here.
*/
+  if (stack_realign_fp)
+    offset = (offset + stack_alignment_needed -1) &
-stack_alignment_needed;
 
-  gcc_assert (!size || stack_alignment_needed);
-  gcc_assert (preferred_alignment >= STACK_BOUNDARY / BITS_PER_UNIT);
-  gcc_assert (preferred_alignment <= PREFERRED_STACK_BOUNDARY /
BITS_PER_UNIT);
-  gcc_assert (stack_alignment_needed
-	      <= PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
-
-  if (stack_alignment_needed < STACK_BOUNDARY / BITS_PER_UNIT)
-    stack_alignment_needed = STACK_BOUNDARY / BITS_PER_UNIT;
-
   /* Register save area */
   offset += frame->nregs * UNITS_PER_WORD;
 
@@ -6249,35 +6252,129 @@
     RTX_FRAME_RELATED_P (insn) = 1;
 }
 
+/* Find an available register to be used as dynamic realign argument
+   pointer regsiter.  Such a register will be written in prologue and
+   used in begin of body, so it must not be
+	1. parameter passing register.
+	2. GOT pointer.
+   For i386, we use CX if it is not used to pass parameter. Otherwise
+   we just pick DI.
+   For x86_64, we just pick R13 directly.
+
+   Return: the regno of choosed register.  */
+
+static unsigned int 
+find_drap_reg (void)
+{
+  int param_reg_num;
+
+  if (TARGET_64BIT)
+    return R13_REG;
+
+  /* Use DI for nested function or function need static chain.  */
+  if (decl_function_context (cfun->decl)
+      && !DECL_NO_STATIC_CHAIN (cfun->decl))
+    return DI_REG;
+
+  if (cfun->tail_call_emit)
+    return DI_REG;
+
+  param_reg_num = ix86_function_regparm (TREE_TYPE (cfun->decl),
+					 cfun->decl);
+
+  if (param_reg_num <= 2
+      && !lookup_attribute ("fastcall",
+			    TYPE_ATTRIBUTES (TREE_TYPE (cfun->decl))))
+    return CX_REG;
+
+  return DI_REG;
+}
+
 /* Handle the TARGET_INTERNAL_ARG_POINTER hook.  */
 
 static rtx
 ix86_internal_arg_pointer (void)
 {
-  bool has_force_align_arg_pointer =
-    (0 != lookup_attribute (ix86_force_align_arg_pointer_string,
-			    TYPE_ATTRIBUTES (TREE_TYPE
(current_function_decl))));
-  if ((FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN
-       && DECL_NAME (current_function_decl)
-       && MAIN_NAME_P (DECL_NAME (current_function_decl))
-       && DECL_FILE_SCOPE_P (current_function_decl))
-      || ix86_force_align_arg_pointer
-      || has_force_align_arg_pointer)
+  /* If called in "expand" pass, currently_expanding_to_rtl will
+     be true */
+  if (currently_expanding_to_rtl) 
+    return virtual_incoming_args_rtx;
+
+  /* Prefer the one specified at command line. */
+  ix86_incoming_stack_boundary 
+    = (ix86_user_incoming_stack_boundary
+       ? ix86_user_incoming_stack_boundary
+       : ix86_default_incoming_stack_boundary);
+
+  /* Current stack realign doesn't support eh_return. Assume
+     function who calls eh_return is aligned. There will be sanity
+     check if stack realign happens together with eh_return later.  */
+  if (current_function_calls_eh_return)
+    ix86_incoming_stack_boundary = PREFERRED_STACK_BOUNDARY;
+
+  /* Incoming stack alignment can be changed on individual functions
+     via force_align_arg_pointer attribute.  We use the smallest
+     incoming stack boundary.  */
+  if (ix86_incoming_stack_boundary > ABI_STACK_BOUNDARY
+      && lookup_attribute (ix86_force_align_arg_pointer_string,
+			   TYPE_ATTRIBUTES (TREE_TYPE
(current_function_decl))))
+    ix86_incoming_stack_boundary = ABI_STACK_BOUNDARY;
+
+  /* Stack at entrance of main is aligned by runtime.  We use the
+     smallest incoming stack boundary. */
+  if (ix86_incoming_stack_boundary > MAIN_STACK_BOUNDARY
+      && DECL_NAME (current_function_decl)
+      && MAIN_NAME_P (DECL_NAME (current_function_decl))
+      && DECL_FILE_SCOPE_P (current_function_decl))
+    ix86_incoming_stack_boundary = MAIN_STACK_BOUNDARY;
+
+  gcc_assert (cfun->stack_alignment_needed 
+              <= cfun->stack_alignment_estimated);
+
+  /* x86_64 vararg needs 16byte stack alignment for register save
+     area.  */
+  if (TARGET_64BIT
+      && current_function_stdarg
+      && cfun->stack_alignment_estimated < 128)
+    cfun->stack_alignment_estimated = 128;
+
+  /* Update cfun->stack_alignment_estimated and use it later to align
+     stack.  FIXME: How to optimize for leaf function?  */
+  if (PREFERRED_STACK_BOUNDARY > cfun->stack_alignment_estimated)
+    cfun->stack_alignment_estimated = PREFERRED_STACK_BOUNDARY;
+  if (PREFERRED_STACK_BOUNDARY > cfun->stack_alignment_needed)
+    cfun->stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
+
+  cfun->stack_realign_needed
+    = ix86_incoming_stack_boundary < cfun->stack_alignment_estimated;
+
+  cfun->stack_realign_processed = true;
+
+  if (ix86_force_drap
+      || !ACCUMULATE_OUTGOING_ARGS)
+    cfun->need_drap = true;
+
+  if (stack_realign_drap)
     {
-      /* Nested functions can't realign the stack due to a register
-	 conflict.  */
-      if (DECL_CONTEXT (current_function_decl)
-	  && TREE_CODE (DECL_CONTEXT (current_function_decl)) ==
FUNCTION_DECL)
-	{
-	  if (ix86_force_align_arg_pointer)
-	    warning (0, "-mstackrealign ignored for nested functions");
-	  if (has_force_align_arg_pointer)
-	    error ("%s not supported for nested functions",
-		   ix86_force_align_arg_pointer_string);
-	  return virtual_incoming_args_rtx;
-	}
-      cfun->machine->force_align_arg_pointer = gen_rtx_REG (Pmode,
CX_REG);
-      return copy_to_reg (cfun->machine->force_align_arg_pointer);
+      /* Assign DRAP to vDRAP and returns vDRAP */
+      unsigned int regno = find_drap_reg ();
+      rtx drap_vreg;
+      rtx arg_ptr;
+      rtx seq;
+
+      if (regno != CX_REG)
+	cfun->save_param_ptr_reg = true;
+
+      arg_ptr = gen_rtx_REG (Pmode, regno);
+      cfun->drap_reg = arg_ptr;
+
+      start_sequence ();
+      drap_vreg = copy_to_reg(arg_ptr);
+      seq = get_insns ();
+      end_sequence ();
+      
+      emit_insn_before (seq, NEXT_INSN (entry_of_function ()));
+      return drap_vreg;
     }
   else
     return virtual_incoming_args_rtx;
@@ -6316,53 +6413,62 @@
   bool pic_reg_used;
   struct ix86_frame frame;
   HOST_WIDE_INT allocate;
+  rtx (*gen_andsp) (rtx, rtx, rtx);
 
+  /* DRAP should not coexist with stack_realign_fp */
+  gcc_assert (!(cfun->drap_reg && stack_realign_fp));
+
+  /* Check if stack realign is really needed after reload, and 
+     stores result in cfun */
+  cfun->stack_realign_really = ix86_incoming_stack_boundary 
+                               < cfun->stack_alignment_needed;
+
+  cfun->stack_realign_finalized = true;
+
   ix86_compute_frame_layout (&frame);
 
-  if (cfun->machine->force_align_arg_pointer)
+  /* Emit prologue code to adjust stack alignment and setup DRAP, in
case
+     of DRAP is needed and stack realignment is really needed after
reload */
+  if (cfun->drap_reg && cfun->stack_realign_really)
     {
       rtx x, y;
+      int align_bytes = cfun->stack_alignment_needed / BITS_PER_UNIT;
+      int param_ptr_offset = (cfun->save_param_ptr_reg
+			      ?  STACK_BOUNDARY / BITS_PER_UNIT : 0);
 
+      gcc_assert (stack_realign_drap);
+
       /* Grab the argument pointer.  */
-      x = plus_constant (stack_pointer_rtx, 4);
-      y = cfun->machine->force_align_arg_pointer;
+      x = plus_constant (stack_pointer_rtx, 
+                         (STACK_BOUNDARY / BITS_PER_UNIT 
+			  + param_ptr_offset));
+      y = cfun->drap_reg;
+
+      /* Only need to push parameter pointer reg if it is caller
+	 saved reg */
+      if (cfun->save_param_ptr_reg)
+	{
+	  /* Push arg pointer reg */
+	  insn = emit_insn (gen_push (y));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	}
+
       insn = emit_insn (gen_rtx_SET (VOIDmode, y, x));
-      RTX_FRAME_RELATED_P (insn) = 1;
+      RTX_FRAME_RELATED_P (insn) = 1; 
 
-      /* The unwind info consists of two parts: install the fafp as the
cfa,
-	 and record the fafp as the "save register" of the stack
pointer.
-	 The later is there in order that the unwinder can see where it
-	 should restore the stack pointer across the and insn.  */
-      x = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (1, const0_rtx),
UNSPEC_DEF_CFA);
-      x = gen_rtx_SET (VOIDmode, y, x);
-      RTX_FRAME_RELATED_P (x) = 1;
-      y = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (1, stack_pointer_rtx),
-			  UNSPEC_REG_SAVE);
-      y = gen_rtx_SET (VOIDmode,
cfun->machine->force_align_arg_pointer, y);
-      RTX_FRAME_RELATED_P (y) = 1;
-      x = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, x, y));
-      x = gen_rtx_EXPR_LIST (REG_FRAME_RELATED_EXPR, x, NULL);
-      REG_NOTES (insn) = x;
-
+      gen_andsp = TARGET_64BIT ? gen_anddi3 : gen_andsi3;
       /* Align the stack.  */
-      emit_insn (gen_andsi3 (stack_pointer_rtx, stack_pointer_rtx,
-			     GEN_INT (-16)));
+      insn = emit_insn ((*gen_andsp) (stack_pointer_rtx,
+				  stack_pointer_rtx,
+				  GEN_INT (-align_bytes)));
+      RTX_FRAME_RELATED_P (insn) = 1;
 
-      /* And here we cheat like madmen with the unwind info.  We force
the
-	 cfa register back to sp+4, which is exactly what it was at the
-	 start of the function.  Re-pushing the return address results
in
-	 the return at the same spot relative to the cfa, and thus is
-	 correct wrt the unwind info.  */
-      x = cfun->machine->force_align_arg_pointer;
-      x = gen_frame_mem (Pmode, plus_constant (x, -4));
+      x = cfun->drap_reg;
+      x = gen_frame_mem (Pmode,
+                         plus_constant (x,
+					-(STACK_BOUNDARY /
BITS_PER_UNIT)));
       insn = emit_insn (gen_push (x));
       RTX_FRAME_RELATED_P (insn) = 1;
-
-      x = GEN_INT (4);
-      x = gen_rtx_UNSPEC (VOIDmode, gen_rtvec (1, x), UNSPEC_DEF_CFA);
-      x = gen_rtx_SET (VOIDmode, stack_pointer_rtx, x);
-      x = gen_rtx_EXPR_LIST (REG_FRAME_RELATED_EXPR, x, NULL);
-      REG_NOTES (insn) = x;
     }
 
   /* Note: AT&T enter does NOT have reversed args.  Enter is probably
@@ -6377,6 +6483,19 @@
       RTX_FRAME_RELATED_P (insn) = 1;
     }
 
+  if (stack_realign_fp && cfun->stack_realign_really)
+    {
+      int align_bytes = cfun->stack_alignment_needed / BITS_PER_UNIT;
+      gcc_assert (align_bytes > STACK_BOUNDARY / BITS_PER_UNIT);
+
+      gen_andsp = TARGET_64BIT ? gen_anddi3 : gen_andsi3;
+      /* Align the stack.  */
+      insn = emit_insn ((*gen_andsp) (stack_pointer_rtx,
+				      stack_pointer_rtx,
+				      GEN_INT (-align_bytes)));
+      RTX_FRAME_RELATED_P (insn) = 1;
+    }
+
   allocate = frame.to_allocate;
 
   if (!frame.save_regs_using_mov)
@@ -6391,7 +6510,9 @@
      a red zone location */
   if (TARGET_RED_ZONE && frame.save_regs_using_mov
       && (! TARGET_STACK_PROBE || allocate < CHECK_STACK_LIMIT))
-    ix86_emit_save_regs_using_mov (frame_pointer_needed ?
hard_frame_pointer_rtx
+    ix86_emit_save_regs_using_mov ((frame_pointer_needed
+				     && !cfun->stack_realign_really) 
+                                   ? hard_frame_pointer_rtx
 				   : stack_pointer_rtx,
 				   -frame.nregs * UNITS_PER_WORD);
 
@@ -6450,8 +6571,11 @@
       && !(TARGET_RED_ZONE
          && (! TARGET_STACK_PROBE || allocate < CHECK_STACK_LIMIT)))
     {
-      if (!frame_pointer_needed || !frame.to_allocate)
-        ix86_emit_save_regs_using_mov (stack_pointer_rtx,
frame.to_allocate);
+      if (!frame_pointer_needed
+	  || !frame.to_allocate
+	  || cfun->stack_realign_really)
+        ix86_emit_save_regs_using_mov (stack_pointer_rtx,
+				       frame.to_allocate);
       else
         ix86_emit_save_regs_using_mov (hard_frame_pointer_rtx,
 				       -frame.nregs * UNITS_PER_WORD);
@@ -6501,6 +6625,16 @@
 	emit_insn (gen_prologue_use (pic_offset_table_rtx));
       emit_insn (gen_blockage ());
     }
+
+  if (cfun->drap_reg && !cfun->stack_realign_really)
+    {
+      /* vDRAP is setup but after reload it turns out stack realign
+         isn't necessary, here we will emit prologue to setup DRAP
+         without stack realign adjustment */
+      int drap_bp_offset = STACK_BOUNDARY / BITS_PER_UNIT * 2;
+      rtx x = plus_constant (hard_frame_pointer_rtx, drap_bp_offset);
+      insn = emit_insn (gen_rtx_SET (VOIDmode, cfun->drap_reg, x));
+    }
 }
 
 /* Emit code to restore saved registers using MOV insns.  First
register
@@ -6539,7 +6673,10 @@
 ix86_expand_epilogue (int style)
 {
   int regno;
-  int sp_valid = !frame_pointer_needed ||
current_function_sp_is_unchanging;
+ /* When stack realign may happen, SP must be valid. */
+  int sp_valid = (!frame_pointer_needed
+		  || current_function_sp_is_unchanging
+		  || (stack_realign_fp && cfun->stack_realign_really));
   struct ix86_frame frame;
   HOST_WIDE_INT offset;
 
@@ -6576,11 +6713,16 @@
     {
       /* Restore registers.  We can use ebp or esp to address the
memory
 	 locations.  If both are available, default to ebp, since
offsets
-	 are known to be small.  Only exception is esp pointing directly
to the
-	 end of block of saved registers, where we may simplify
addressing
-	 mode.  */
+	 are known to be small.  Only exception is esp pointing directly
+	 to the end of block of saved registers, where we may simplify
+	 addressing mode.  
 
-      if (!frame_pointer_needed || (sp_valid && !frame.to_allocate))
+	 If we are realigning stack with bp and sp, regs restore can't
+	 be addressed by bp. sp must be used instead.  */
+
+      if (!frame_pointer_needed
+	  || (sp_valid && !frame.to_allocate) 
+	  || (stack_realign_fp && cfun->stack_realign_really))
 	ix86_emit_restore_regs_using_mov (stack_pointer_rtx,
 					  frame.to_allocate, style ==
2);
       else
@@ -6592,6 +6734,10 @@
 	{
 	  rtx tmp, sa = EH_RETURN_STACKADJ_RTX;
 
+	  if (cfun->stack_realign_really)
+	    {
+	      error("Stack realign has conflict with eh_return");
+	    }
 	  if (frame_pointer_needed)
 	    {
 	      tmp = gen_rtx_PLUS (Pmode, hard_frame_pointer_rtx, sa);
@@ -6635,10 +6781,16 @@
   else
     {
       /* First step is to deallocate the stack frame so that we can
-	 pop the registers.  */
+	 pop the registers.
+
+	 If we realign stack with frame pointer, then stack pointer
+         won't be able to recover via lea $offset(%bp), %sp, because
+         there is a padding area between bp and sp for realign. 
+         "add $to_allocate, %sp" must be used instead.  */
       if (!sp_valid)
 	{
 	  gcc_assert (frame_pointer_needed);
+          gcc_assert (!(stack_realign_fp &&
cfun->stack_realign_really));
 	  pro_epilogue_adjust_stack (stack_pointer_rtx,
 				     hard_frame_pointer_rtx,
 				     GEN_INT (offset), style);
@@ -6661,18 +6813,47 @@
 	     able to grok it fast.  */
 	  if (TARGET_USE_LEAVE)
 	    emit_insn (TARGET_64BIT ? gen_leave_rex64 () : gen_leave
());
-	  else if (TARGET_64BIT)
-	    emit_insn (gen_popdi1 (hard_frame_pointer_rtx));
-	  else
-	    emit_insn (gen_popsi1 (hard_frame_pointer_rtx));
+	  else 
+            {
+              /* For stack realigned really happens, recover stack 
+                 pointer to hard frame pointer is a must, if not using 
+                 leave.  */
+              if (stack_realign_fp && cfun->stack_realign_really)
+		pro_epilogue_adjust_stack (stack_pointer_rtx,
+					   hard_frame_pointer_rtx,
+					   const0_rtx, style);
+              if (TARGET_64BIT)
+                emit_insn (gen_popdi1 (hard_frame_pointer_rtx));
+              else
+                emit_insn (gen_popsi1 (hard_frame_pointer_rtx));
+            }
 	}
     }
 
-  if (cfun->machine->force_align_arg_pointer)
+  if (cfun->drap_reg && cfun->stack_realign_really)
     {
-      emit_insn (gen_addsi3 (stack_pointer_rtx,
-			     cfun->machine->force_align_arg_pointer,
-			     GEN_INT (-4)));
+      int param_ptr_offset = (cfun->save_param_ptr_reg
+			      ? STACK_BOUNDARY / BITS_PER_UNIT : 0);
+      gcc_assert (stack_realign_drap);
+      if (TARGET_64BIT)
+        {
+          emit_insn (gen_adddi3 (stack_pointer_rtx,
+				 cfun->drap_reg,
+				 GEN_INT (-(STACK_BOUNDARY /
BITS_PER_UNIT
+					    + param_ptr_offset))));
+          if (cfun->save_param_ptr_reg)
+            emit_insn (gen_popdi1 (cfun->drap_reg));
+        }
+      else
+        {
+          emit_insn (gen_addsi3 (stack_pointer_rtx,
+				 cfun->drap_reg,
+				 GEN_INT (-(STACK_BOUNDARY /
BITS_PER_UNIT 
+					    + param_ptr_offset))));
+          if (cfun->save_param_ptr_reg)
+            emit_insn (gen_popsi1 (cfun->drap_reg));
+        }
+      
     }
 
   /* Sibcall epilogues don't want a return instruction.  */
Index: stmt.c
===================================================================
--- stmt.c	(revision 133266)
+++ stmt.c	(working copy)
@@ -1820,7 +1820,7 @@
 	{
 	  /* Now restore our arg pointer from the address at which it
 	     was saved in our stack frame.  */
-	  emit_move_insn (virtual_incoming_args_rtx,
+	  emit_move_insn (current_function_internal_arg_pointer,
 			  copy_to_reg (get_arg_pointer_save_area
(cfun)));
 	}
     }
Index: reload1.c
===================================================================
--- reload1.c	(revision 133266)
+++ reload1.c	(working copy)
@@ -2280,7 +2280,13 @@
 	  if (offsets_at[CODE_LABEL_NUMBER (x) - first_label_num][i]
 	      != (initial_p ? reg_eliminate[i].initial_offset
 		  : reg_eliminate[i].offset))
-	    reg_eliminate[i].can_eliminate = 0;
+            {
+	      /* Must not disable reg eliminate because stack
realignment
+	         must eliminate frame pointer to stack pointer.  */
+	      gcc_assert (! MAX_VECTORIZE_STACK_ALIGNMENT
+			  || ! stack_realign_fp);
+	      reg_eliminate[i].can_eliminate = 0;
+            }
 
       return;
 
@@ -2359,7 +2365,13 @@
 	 offset because we are doing a jump to a variable address.  */
       for (p = reg_eliminate; p < &reg_eliminate[NUM_ELIMINABLE_REGS];
p++)
 	if (p->offset != p->initial_offset)
-	  p->can_eliminate = 0;
+	  {
+	    /* Must not disable reg eliminate because stack realignment
+	       must eliminate frame pointer to stack pointer.  */
+	    gcc_assert (! MAX_VECTORIZE_STACK_ALIGNMENT
+			|| ! stack_realign_fp);
+	    p->can_eliminate = 0;
+	  }
       break;
 
     default:
@@ -2850,7 +2862,13 @@
       /* If we modify the source of an elimination rule, disable it.
*/
       for (ep = reg_eliminate; ep <
&reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
 	if (ep->from_rtx == XEXP (x, 0))
-	  ep->can_eliminate = 0;
+	  {
+	    /* Must not disable reg eliminate because stack realignment
+	       must eliminate frame pointer to stack pointer.  */
+	    gcc_assert (! MAX_VECTORIZE_STACK_ALIGNMENT
+			|| ! stack_realign_fp);
+	    ep->can_eliminate = 0;
+	  }
 
       /* If we modify the target of an elimination rule by adding a
constant,
 	 update its offset.  If we modify the target in any other way,
we'll
@@ -2876,7 +2894,14 @@
 		    && CONST_INT_P (XEXP (XEXP (x, 1), 1)))
 		  ep->offset -= INTVAL (XEXP (XEXP (x, 1), 1));
 		else
-		  ep->can_eliminate = 0;
+		  {
+		    /* Must not disable reg eliminate because stack
+		       realignment must eliminate frame pointer to
+		       stack pointer.  */
+		    gcc_assert (! MAX_VECTORIZE_STACK_ALIGNMENT
+				|| ! stack_realign_fp);
+		    ep->can_eliminate = 0;
+		  }
 	      }
 	  }
 
@@ -2919,7 +2944,13 @@
 	 know how this register is used.  */
       for (ep = reg_eliminate; ep <
&reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
 	if (ep->from_rtx == XEXP (x, 0))
-	  ep->can_eliminate = 0;
+	  {
+	    /* Must not disable reg eliminate because stack realignment
+	       must eliminate frame pointer to stack pointer.  */
+	    gcc_assert (! MAX_VECTORIZE_STACK_ALIGNMENT
+			|| ! stack_realign_fp);
+	    ep->can_eliminate = 0;
+	  }
 
       elimination_effects (XEXP (x, 0), mem_mode);
       return;
@@ -2930,7 +2961,13 @@
 	 be performed.  Otherwise, we need not be concerned about it.
*/
       for (ep = reg_eliminate; ep <
&reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
 	if (ep->to_rtx == XEXP (x, 0))
-	  ep->can_eliminate = 0;
+	  {
+	    /* Must not disable reg eliminate because stack realignment
+	       must eliminate frame pointer to stack pointer.  */
+	    gcc_assert (! MAX_VECTORIZE_STACK_ALIGNMENT
+			|| ! stack_realign_fp);
+	    ep->can_eliminate = 0;
+	  }
 
       elimination_effects (XEXP (x, 0), mem_mode);
       return;
@@ -2964,7 +3001,14 @@
 		    && GET_CODE (XEXP (src, 1)) == CONST_INT)
 		  ep->offset -= INTVAL (XEXP (src, 1));
 		else
-		  ep->can_eliminate = 0;
+		  {
+		    /* Must not disable reg eliminate because stack
+		       realignment must eliminate frame pointer to
+		       stack pointer.  */
+		    gcc_assert (! MAX_VECTORIZE_STACK_ALIGNMENT
+				|| ! stack_realign_fp);
+		    ep->can_eliminate = 0;
+		  }
 	      }
 	}
 
@@ -3293,7 +3337,14 @@
 	      for (ep = reg_eliminate; ep <
&reg_eliminate[NUM_ELIMINABLE_REGS];
 		   ep++)
 		if (ep->from_rtx == orig_operand[i])
-		  ep->can_eliminate = 0;
+		  {
+		    /* Must not disable reg eliminate because stack
+		       realignment must eliminate frame pointer to
+		       stack pointer.  */
+		    gcc_assert (! MAX_VECTORIZE_STACK_ALIGNMENT
+				|| ! stack_realign_fp);
+		    ep->can_eliminate = 0;
+		  }
 	    }
 
 	  /* Companion to the above plus substitution, we can allow
@@ -3423,7 +3474,13 @@
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
ep++)
     {
       if (ep->previous_offset != ep->offset && ep->ref_outside_mem)
-	ep->can_eliminate = 0;
+	{
+	  /* Must not disable reg eliminate because stack realignment
+	     must eliminate frame pointer to stack pointer.  */
+	  gcc_assert (! MAX_VECTORIZE_STACK_ALIGNMENT
+		      || ! stack_realign_fp);
+	  ep->can_eliminate = 0;
+	}
 
       ep->ref_outside_mem = 0;
 
@@ -3499,6 +3556,11 @@
 	    || XEXP (SET_SRC (x), 0) != dest
 	    || GET_CODE (XEXP (SET_SRC (x), 1)) != CONST_INT))
       {
+	/* Must not disable reg eliminate because stack realignment
+	   must eliminate frame pointer to stack pointer.  */
+	gcc_assert (! MAX_VECTORIZE_STACK_ALIGNMENT
+		    || ! stack_realign_fp);
+
 	reg_eliminate[i].can_eliminate_previous
 	  = reg_eliminate[i].can_eliminate = 0;
 	num_eliminable--;
@@ -3669,8 +3731,11 @@
   frame_pointer_needed = 1;
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
ep++)
     {
-      if (ep->can_eliminate && ep->from == FRAME_POINTER_REGNUM
-	  && ep->to != HARD_FRAME_POINTER_REGNUM)
+      if (ep->can_eliminate
+	  && ep->from == FRAME_POINTER_REGNUM
+	  && ep->to != HARD_FRAME_POINTER_REGNUM
+	  && (! MAX_VECTORIZE_STACK_ALIGNMENT
+	      || ! cfun->stack_realign_needed))
 	frame_pointer_needed = 0;
 
       if (! ep->can_eliminate && ep->can_eliminate_previous)
@@ -3714,19 +3779,9 @@
   if (!reg_eliminate)
     reg_eliminate = xcalloc (sizeof (struct elim_table),
NUM_ELIMINABLE_REGS);
 
-  /* Does this function require a frame pointer?  */
+  /* frame_pointer_needed should has been set.  */
+  gcc_assert (cfun->need_frame_pointer_set);
 
-  frame_pointer_needed = (! flag_omit_frame_pointer
-			  /* ?? If EXIT_IGNORE_STACK is set, we will not
save
-			     and restore sp for alloca.  So we can't
eliminate
-			     the frame pointer in that case.  At some
point,
-			     we should improve this by emitting the
-			     sp-adjusting insns for this case.  */
-			  || (current_function_calls_alloca
-			      && EXIT_IGNORE_STACK)
-			  || current_function_accesses_prior_frames
-			  || FRAME_POINTER_REQUIRED);
-
   num_eliminable = 0;
 
 #ifdef ELIMINABLE_REGS
@@ -3737,7 +3792,10 @@
       ep->to = ep1->to;
       ep->can_eliminate = ep->can_eliminate_previous
 	= (CAN_ELIMINATE (ep->from, ep->to)
-	   && ! (ep->to == STACK_POINTER_REGNUM &&
frame_pointer_needed));
+	   && ! (ep->to == STACK_POINTER_REGNUM
+		 && frame_pointer_needed 
+		 && (! MAX_VECTORIZE_STACK_ALIGNMENT
+		     || ! stack_realign_fp)));
     }
 #else
   reg_eliminate[0].from = reg_eliminate_1[0].from;
Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(.../fsf/trunk/gcc/doc)	(revision 1884)
+++ doc/extend.texi	(.../branches/stack-frame/gcc/doc)
(revision 1884)
@@ -2701,17 +2701,13 @@
 
 @item force_align_arg_pointer
 @cindex @code{force_align_arg_pointer} attribute
-On the Intel x86, the @code{force_align_arg_pointer} attribute may be
-applied to individual function definitions, generating an alternate
-prologue and epilogue that realigns the runtime stack.  This supports
-mixing legacy codes that run with a 4-byte aligned stack with modern
-codes that keep a 16-byte stack for SSE compatibility.  The alternate
-prologue and epilogue are slower and bigger than the regular ones, and
-the alternate prologue requires a scratch register; this lowers the
-number of registers available if used in conjunction with the
-@code{regparm} attribute.  The @code{force_align_arg_pointer}
-attribute is incompatible with nested functions; this is considered a
-hard error.
+The @code{force_align_arg_pointer} attribute may be applied to
+individual function definitions, assuming that the runtime stack is
+aligned according to the psABI and generating an alternate
+prologue/aepilogue that realigns the runtime stack if necessary. 
+On the Intel x86, this supports mixing codes that keep a 4-byte aligned
+stack, as specified by i386 psABI, with codes that need a 16-byte
+aligned stack, as required by SSE instructions. 
 
 @item returns_twice
 @cindex @code{returns_twice} attribute
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(.../fsf/trunk/gcc/doc)	(revision 1884)
+++ doc/invoke.texi	(.../branches/stack-frame/gcc/doc)
(revision 1884)
@@ -552,7 +552,9 @@
 -masm=@var{dialect}  -mno-fancy-math-387 @gol
 -mno-fp-ret-in-387  -msoft-float @gol
 -mno-wide-multiply  -mrtd  -malign-double @gol
--mpreferred-stack-boundary=@var{num} -mcx16 -msahf -mrecip @gol
+-mpreferred-stack-boundary=@var{num}
+-mincoming-stack-boundary=@var{num}
+-mcx16 -msahf -mrecip @gol
 -mmmx  -msse  -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
 -msse4a -m3dnow -mpopcnt -mabm -msse5 @gol
 -mthreads  -mno-align-stringops  -minline-all-stringops @gol
@@ -10702,18 +10704,14 @@
 
 @item -mstackrealign
 @opindex mstackrealign
-Realign the stack at entry.  On the Intel x86, the
-@option{-mstackrealign} option will generate an alternate prologue and
-epilogue that realigns the runtime stack.  This supports mixing legacy
-codes that keep a 4-byte aligned stack with modern codes that keep a
-16-byte stack for SSE compatibility.  The alternate prologue and
-epilogue are slower and bigger than the regular ones, and the
-alternate prologue requires an extra scratch register; this lowers the
-number of registers available if used in conjunction with the
-@code{regparm} attribute.  The @option{-mstackrealign} option is
-incompatible with the nested function prologue; this is considered a
-hard error.  See also the attribute @code{force_align_arg_pointer},
-applicable to individual functions.
+Realign the stack at entry.  The @option{-mstackrealign} option will
+assume that the runtime stack is aligned according to the psABI and
+generate an alternate prologue/epilogue that realigns the runtime stack
+if necessary.  On the Intel x86, this supports mixing codes that keep a
+4-byte aligned stack, as specified by i386 psABI, with codes that need
+a 16-byte aligned stack, as required by SSE instructions.  See also the
+attribute @code{force_align_arg_pointer}, applicable to individual
+functions.
 
 @item -mpreferred-stack-boundary=@var{num}
 @opindex mpreferred-stack-boundary
@@ -10721,6 +10719,12 @@
 byte boundary.  If @option{-mpreferred-stack-boundary} is not
specified,
 the default is 4 (16 bytes or 128 bits).
 
+@item -mincoming-stack-boundary=@var{num}
+@opindex mincoming-stack-boundary
+Assume the incoming stack aligned to a 2 raised to @var{num} byte
+boundary.  If @option{-mincoming-stack-boundary} is not specified,
+the one specified by @option{-mpreferred-stack-boundary} will be used.
+
 On Pentium and PentiumPro, @code{double} and @code{long double} values
 should be aligned to an 8 byte boundary (see @option{-malign-double})
or
 suffer significant run time performance penalties.  On Pentium III, the


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]