[PATCH] i386: Avoid stack realignment if possible

H.J. Lu hjl.tools@gmail.com
Sat Jul 8 00:57:00 GMT 2017


On Fri, Jul 07, 2017 at 09:58:42AM -0700, H.J. Lu wrote:
> On Fri, Dec 20, 2013 at 8:06 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > Hi!
> >
> > Honza recently changed the i?86 backend, so that it often doesn't
> > do -maccumulate-outgoing-args by default on x86_64.
> > Unfortunately, on some of the here included testcases this regressed
> > quite a bit the generated code.  As AVX vectors are used, the dynamic
> > realignment code needs to assume e.g. that some of them will need to be
> > spilled, and for -mno-accumulate-outgoing-args the code needs to set
> > need_drap early as well.  But in when emitting the prologue/epilogue,
> > if need_drap is set, we don't perform the optimization for leaf functions
> > which have zero size stack frame, thus we end up with uselessly doing
> > dynamic stack realignment, setting up DRAP that nothing uses and later on
> > restore everything back.
> >
> > This patch improves it, if the DRAP register isn't live at the start of
> > entry bb successor and we aren't going to realign the stack, we don't
> > need DRAP at all, and even if we need DRAP register, that can't be the sole
> > reason for doing stack realignment, the prologue code is able to set up DRAP
> > even without dynamic stack realignment.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2013-12-20  Jakub Jelinek  <jakub@redhat.com>
> >
> >         PR target/59501
> >         * config/i386/i386.c (ix86_save_reg): Don't return true for drap_reg
> >         if !crtl->stack_realign_needed.
> >         (ix86_finalize_stack_realign_flags): If drap_reg isn't live on entry
> >         and stack_realign_needed will be false, clear drap_reg and need_drap.
> >         Optimize leaf functions that don't need stack frame even if
> >         crtl->need_drap.
> >
> >         * gcc.target/i386/pr59501-1.c: New test.
> >         * gcc.target/i386/pr59501-1a.c: New test.
> >         * gcc.target/i386/pr59501-2.c: New test.
> >         * gcc.target/i386/pr59501-2a.c: New test.
> >         * gcc.target/i386/pr59501-3.c: New test.
> >         * gcc.target/i386/pr59501-3a.c: New test.
> >         * gcc.target/i386/pr59501-4.c: New test.
> >         * gcc.target/i386/pr59501-4a.c: New test.
> >         * gcc.target/i386/pr59501-5.c: New test.
> >         * gcc.target/i386/pr59501-6.c: New test.
> >
> >
> > --- gcc/testsuite/gcc.target/i386/pr59501-4a.c.jj       2013-12-20 12:19:20.603212859 +0100
> > +++ gcc/testsuite/gcc.target/i386/pr59501-4a.c  2013-12-20 12:23:33.647881672 +0100
> > @@ -0,0 +1,8 @@
> > +/* PR target/59501 */
> > +/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -mavx -maccumulate-outgoing-args" } */
> > +
> > +#include "pr59501-3a.c"
> > +
> > +/* Verify no dynamic realignment is performed.  */
> > +/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } */
> >
> 
> Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
> xfailed due to stack frame access via frame pointer instead of DARP.
> This patch finds the maximum stack alignment from the stack frame access
> instructions and avoids stack realignment if stack alignment needed is
> less than incoming stack boundary.
> 
> I am testing this patch.  OK for trunk if there is no regression?
> 
> 

We need to keep the preferred stack alignment as the minimum stack
alignment. Here is the updated patch.  Tested on x86-64.  OK for
trunk?

Thanks.

H.J.
---
Since DRAP isn't used with -maccumulate-outgoing-args, pr59501-4a.c was
xfailed due to stack frame access via frame pointer instead of DARP.
This patch finds the maximum stack alignment from the stack frame access
instructions and avoids stack realignment if stack alignment needed is
less than incoming stack boundary.

gcc/

	PR target/59501
	* config/i386/i386.c (ix86_finalize_stack_realign_flags): Don't
	realign stack if stack alignment needed is less than incoming
	stack boundary.

gcc/testsuite/

	PR target/59501
	* gcc.target/i386/pr59501-4a.c: Remove xfail.
---
 gcc/config/i386/i386.c                     | 84 +++++++++++++++++++-----------
 gcc/testsuite/gcc.target/i386/pr59501-4a.c |  2 +-
 2 files changed, 56 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b041524..28febd0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14161,6 +14161,11 @@ ix86_finalize_stack_realign_flags (void)
       add_to_hard_reg_set (&set_up_by_prologue, Pmode, ARG_POINTER_REGNUM);
       add_to_hard_reg_set (&set_up_by_prologue, Pmode,
 			   HARD_FRAME_POINTER_REGNUM);
+
+      /* The preferred stack alignment is the minimum stack alignment.  */
+      unsigned int stack_alignment = crtl->preferred_stack_boundary;
+      bool require_stack_frame = false;
+
       FOR_EACH_BB_FN (bb, cfun)
         {
           rtx_insn *insn;
@@ -14169,43 +14174,64 @@ ix86_finalize_stack_realign_flags (void)
 		&& requires_stack_frame_p (insn, prologue_used,
 					   set_up_by_prologue))
 	      {
-		if (crtl->stack_realign_needed != stack_realign)
-		  recompute_frame_layout_p = true;
-		crtl->stack_realign_needed = stack_realign;
-		crtl->stack_realign_finalized = true;
-		if (recompute_frame_layout_p)
-		  ix86_compute_frame_layout ();
-		return;
+		require_stack_frame = true;
+
+		/* Find the maximum stack alignment.  */
+		subrtx_iterator::array_type array;
+		FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
+		  if ((MEM_P (*iter)))
+		    {
+		      unsigned int alignment = MEM_ALIGN (*iter);
+		      if (alignment > stack_alignment)
+			stack_alignment = alignment;
+		    }
 	      }
 	}
 
-      /* If drap has been set, but it actually isn't live at the start
-	 of the function, there is no reason to set it up.  */
-      if (crtl->drap_reg)
+      if (require_stack_frame)
 	{
-	  basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
-	  if (! REGNO_REG_SET_P (DF_LR_IN (bb), REGNO (crtl->drap_reg)))
+	  /* Stack frame is required.  If stack alignment needed is less
+	     than incoming stack boundary, don't realign stack.  */
+	  stack_realign = incoming_stack_boundary < stack_alignment;
+	  if (!stack_realign)
 	    {
-	      crtl->drap_reg = NULL_RTX;
-	      crtl->need_drap = false;
+	      crtl->max_used_stack_slot_alignment
+		= incoming_stack_boundary;
+	      crtl->stack_alignment_needed
+		= incoming_stack_boundary;
 	    }
 	}
       else
-	cfun->machine->no_drap_save_restore = true;
-
-      frame_pointer_needed = false;
-      stack_realign = false;
-      crtl->max_used_stack_slot_alignment = incoming_stack_boundary;
-      crtl->stack_alignment_needed = incoming_stack_boundary;
-      crtl->stack_alignment_estimated = incoming_stack_boundary;
-      if (crtl->preferred_stack_boundary > incoming_stack_boundary)
-	crtl->preferred_stack_boundary = incoming_stack_boundary;
-      df_finish_pass (true);
-      df_scan_alloc (NULL);
-      df_scan_blocks ();
-      df_compute_regs_ever_live (true);
-      df_analyze ();
-      recompute_frame_layout_p = true;
+	{
+	  /* If drap has been set, but it actually isn't live at the
+	     start of the function, there is no reason to set it up.  */
+	  if (crtl->drap_reg)
+	    {
+	      basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
+	      if (! REGNO_REG_SET_P (DF_LR_IN (bb),
+				     REGNO (crtl->drap_reg)))
+		{
+		  crtl->drap_reg = NULL_RTX;
+		  crtl->need_drap = false;
+		}
+	    }
+	  else
+	    cfun->machine->no_drap_save_restore = true;
+
+	  frame_pointer_needed = false;
+	  stack_realign = false;
+	  crtl->max_used_stack_slot_alignment = incoming_stack_boundary;
+	  crtl->stack_alignment_needed = incoming_stack_boundary;
+	  crtl->stack_alignment_estimated = incoming_stack_boundary;
+	  if (crtl->preferred_stack_boundary > incoming_stack_boundary)
+	    crtl->preferred_stack_boundary = incoming_stack_boundary;
+	  df_finish_pass (true);
+	  df_scan_alloc (NULL);
+	  df_scan_blocks ();
+	  df_compute_regs_ever_live (true);
+	  df_analyze ();
+	  recompute_frame_layout_p = true;
+	}
     }
 
   if (crtl->stack_realign_needed != stack_realign)
diff --git a/gcc/testsuite/gcc.target/i386/pr59501-4a.c b/gcc/testsuite/gcc.target/i386/pr59501-4a.c
index 5c3cb68..908c7f4 100644
--- a/gcc/testsuite/gcc.target/i386/pr59501-4a.c
+++ b/gcc/testsuite/gcc.target/i386/pr59501-4a.c
@@ -5,4 +5,4 @@
 #include "pr59501-3a.c"
 
 /* Verify no dynamic realignment is performed.  */
-/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*sp" } } */
-- 
2.9.4



More information about the Gcc-patches mailing list