This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 4/6]: Ping: Merge from Stack Branch - i386 backend changes


Hi,

Index: config/i386/i386.h
===================================================================
--- config/i386/i386.h	(revision 134716)
+++ config/i386/i386.h	(working copy)
@@ -806,16 +806,32 @@ enum target_cpu_default
 /* Boundary (in *bits*) on which stack pointer should be aligned.  */
 #define STACK_BOUNDARY BITS_PER_WORD
 
+/* Stack boundary of the main function guaranteed by OS.  */
+#define MAIN_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32)
+
+/* Stack boundary guaranteed by ABI.  */
+#define ABI_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32)
+
 /* Boundary (in *bits*) on which the stack pointer prefers to be
    aligned; the compiler cannot rely on having this alignment.  */
 #define PREFERRED_STACK_BOUNDARY ix86_preferred_stack_boundary
 
-/* As of July 2001, many runtimes do not align the stack properly when
-   entering main.  This causes expand_main_function to forcibly align
-   the stack, which results in aligned frames for functions called from
-   main, though it does nothing for the alignment of main itself.  */
-#define FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN \
-  (ix86_preferred_stack_boundary > STACK_BOUNDARY && !TARGET_64BIT)
+/* It should be ABI_STACK_BOUNDARY.  But we set it to 128 bits for
+   both 32bit and 64bit, to support codes that need 128 bit stack
+   alignment for SSE instructions, but can't realign the stack.  */
+#define PREFERRED_STACK_BOUNDARY_DEFAULT 128

So basically you are dropping the main function codegen hack, but
keeping preferred stack boundary for 128 and force realignment by
default, right?

I guess we either want to keep PREFERRED_STACK_BOUNDARY bumped up so
code is compatible with what older GCC generate, but then we want to
keep the FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN.  Or we want to drop
preffered stack boundary by default to get smaller and faster code when
alginment is not needed.
+
+/* 1 if -mstackrealign should be turned on by default.  It will
+   generate an alternate prologue and epilogue that realigns the
+   runtime stack if nessary.  This supports mixing codes that keep a
+   4-byte aligned stack, as specified by i386 psABI, with codes that
+   need a 16-byte aligned stack, as required by SSE instructions.  If
+   STACK_REALIGN_DEFAULT is 1 and PREFERRED_STACK_BOUNDARY_DEFAULT is
+   128, stacks for all functions may be realigned.  */

I don't see why you wnat to realign all functions here?

 
 #define CAN_ELIMINATE(FROM, TO) \
-  ((TO) == STACK_POINTER_REGNUM ? !frame_pointer_needed : 1)
+  (stack_realign_fp \
+  ? ((FROM) == ARG_POINTER_REGNUM && (TO) == HARD_FRAME_POINTER_REGNUM) \
+    || ((FROM) == FRAME_POINTER_REGNUM && (TO) == STACK_POINTER_REGNUM) \
+  : ((TO) == STACK_POINTER_REGNUM ? !frame_pointer_needed : 1))


This can be moved offline..
+/* Find an available register to be used as dynamic realign argument
+   pointer regsiter.  Such a register will be written in prologue and
+   used in begin of body, so it must not be
+	1. parameter passing register.
+	2. GOT pointer.
+   For i386, we use CX if it is not used to pass parameter. Otherwise
+   we just pick DI.
+   For x86_64, we just pick R13 directly.

For x86_64, using lower reg should save code size.  I am confused how it
is used by the body.  I tought it is live only across the prologue/begin
of function boundary, otherwise that drap is a pseudo.
Fixing CX/DI for whole body wuld definitly lead to issues with asm.
+
+  /* Update crtl->stack_alignment_estimated and use it later to align
+     stack.  FIXME: How to optimize for leaf function?  */

What is the deal here?  Can't we simply check if the function is leaf
and void bumping the alignments up?
Note that cgraph_rtl_info is tracking stack alignments needed by
functions already compiled, so we ought to be able to do better here and
only upgrade it to the largest alignment needed by function called.
This is how the original code was working BTW.
+  if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS)
+    crtl->need_drap = true;

We now have DRAP for all non-accumulate outgoing args subtargets?

+/* Finalize stack_realign_really flag, which will guide prologue/epilogue
+   to be generated in correct form.  */
+static void 
+ix86_finalize_stack_realign_flags (void)
+{
+  /* Check if stack realign is really needed after reload, and 
+     stores result in cfun */
+  unsigned int stack_realign = (ix86_incoming_stack_boundary
+				< (current_function_is_leaf
+				   ? crtl->stack_alignment_used
+				   : crtl->stack_alignment_needed));
+
+  if (crtl->stack_realign_finalized)
+    {
+      /* After stack_realign_really is finalized, we can't no longer
+	 change it.  */
+      gcc_assert (crtl->stack_realign_really == stack_realign);
+    }
+  else
+    {
+      crtl->stack_realign_really = stack_realign;
+      crtl->stack_realign_finalized = true;
+    }

We probably want to track the called function alignment here.
However, stack_realign_really/stack_realign_finalized seems to be only
used by i386 bits.  If so, it ought to go to the machine specific cfun
(or static variable)

Honza


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]