This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 4/6]: Ping: Merge from Stack Branch - i386 backend changes
Hi Jan,
Thanks for reviewing it. Joey and Xuepeng are on vacation until next Monday.
Their replies may be delayed.
On Wed, Apr 30, 2008 at 8:20 AM, Jan Hubicka <jh@suse.cz> wrote:
>
> Hi,
>
> Index: config/i386/i386.h
> ===================================================================
> --- config/i386/i386.h (revision 134716)
> +++ config/i386/i386.h (working copy)
> @@ -806,16 +806,32 @@ enum target_cpu_default
> /* Boundary (in *bits*) on which stack pointer should be aligned. */
> #define STACK_BOUNDARY BITS_PER_WORD
>
> +/* Stack boundary of the main function guaranteed by OS. */
> +#define MAIN_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32)
> +
> +/* Stack boundary guaranteed by ABI. */
> +#define ABI_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32)
> +
> /* Boundary (in *bits*) on which the stack pointer prefers to be
> aligned; the compiler cannot rely on having this alignment. */
> #define PREFERRED_STACK_BOUNDARY ix86_preferred_stack_boundary
>
> -/* As of July 2001, many runtimes do not align the stack properly when
> - entering main. This causes expand_main_function to forcibly align
> - the stack, which results in aligned frames for functions called from
> - main, though it does nothing for the alignment of main itself. */
> -#define FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN \
> - (ix86_preferred_stack_boundary > STACK_BOUNDARY && !TARGET_64BIT)
> +/* It should be ABI_STACK_BOUNDARY. But we set it to 128 bits for
> + both 32bit and 64bit, to support codes that need 128 bit stack
> + alignment for SSE instructions, but can't realign the stack. */
> +#define PREFERRED_STACK_BOUNDARY_DEFAULT 128
>
> So basically you are dropping the main function codegen hack, but
> keeping preferred stack boundary for 128 and force realignment by
> default, right?
Yes. We align the stack for main like any other functions. The
difference is for main, the incoming stack boundary is determined
by MAIN_STACK_BOUNDARY and for other functions, the incoming
stack boundary is set by PREFERRED_STACK_BOUNDARY_DEFAULT.
On ia32, since by default, MAIN_STACK_BOUNDARY is less than
PREFERRED_STACK_BOUNDARY_DEFAULT, we will align
stack for main automatically.
>
> I guess we either want to keep PREFERRED_STACK_BOUNDARY bumped up so
> code is compatible with what older GCC generate, but then we want to
> keep the FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN. Or we want to drop
> preffered stack boundary by default to get smaller and faster code when
> alginment is not needed.
FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN isn't needed
since stack alignment in main is handled properly according its
incoming and outgoing stack boundaries. as well as alignments
of variables on stack. We added 2 testcases:
gcc.target/i386/align-main-1.c
gcc.target/i386/align-main-2.c
to make sure that we don't align stack for main twice when
there are variables on stack with alignment requirement.
> +
> +/* 1 if -mstackrealign should be turned on by default. It will
> + generate an alternate prologue and epilogue that realigns the
> + runtime stack if nessary. This supports mixing codes that keep a
> + 4-byte aligned stack, as specified by i386 psABI, with codes that
> + need a 16-byte aligned stack, as required by SSE instructions. If
> + STACK_REALIGN_DEFAULT is 1 and PREFERRED_STACK_BOUNDARY_DEFAULT is
> + 128, stacks for all functions may be realigned. */
>
> I don't see why you wnat to realign all functions here?
-mstackrealign is added by Apple. We replaced the old
implementation, which has many limitations, with ours.
>
> #define CAN_ELIMINATE(FROM, TO) \
> - ((TO) == STACK_POINTER_REGNUM ? !frame_pointer_needed : 1)
> + (stack_realign_fp \
> + ? ((FROM) == ARG_POINTER_REGNUM && (TO) == HARD_FRAME_POINTER_REGNUM) \
> + || ((FROM) == FRAME_POINTER_REGNUM && (TO) == STACK_POINTER_REGNUM) \
> + : ((TO) == STACK_POINTER_REGNUM ? !frame_pointer_needed : 1))
>
>
> This can be moved offline..
What do you mean?
> +/* Find an available register to be used as dynamic realign argument
> + pointer regsiter. Such a register will be written in prologue and
> + used in begin of body, so it must not be
> + 1. parameter passing register.
> + 2. GOT pointer.
> + For i386, we use CX if it is not used to pass parameter. Otherwise
> + we just pick DI.
> + For x86_64, we just pick R13 directly.
>
> For x86_64, using lower reg should save code size. I am confused how it
We want to use a temporary register to avoid RA problems. Lower
registers aren't free to use.
> is used by the body. I tought it is live only across the prologue/begin
> of function boundary, otherwise that drap is a pseudo.
Joey, can you comment on this?
> Fixing CX/DI for whole body wuld definitly lead to issues with asm.
Do you have a testcase? I believe our approach should work with
asm statements.
> +
> + /* Update crtl->stack_alignment_estimated and use it later to align
> + stack. FIXME: How to optimize for leaf function? */
>
> What is the deal here? Can't we simply check if the function is leaf
> and void bumping the alignments up?
Joey, correct me if I am wrong. I don't think leaf function information
is available when this function is called.
> Note that cgraph_rtl_info is tracking stack alignments needed by
> functions already compiled, so we ought to be able to do better here and
> only upgrade it to the largest alignment needed by function called.
Joey, Xuepeng, can you look into it?
> This is how the original code was working BTW.
> + if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS)
> + crtl->need_drap = true;
>
> We now have DRAP for all non-accumulate outgoing args subtargets?
Yes, only if stack alignment is needed.
>
> +/* Finalize stack_realign_really flag, which will guide prologue/epilogue
> + to be generated in correct form. */
> +static void
> +ix86_finalize_stack_realign_flags (void)
> +{
> + /* Check if stack realign is really needed after reload, and
> + stores result in cfun */
> + unsigned int stack_realign = (ix86_incoming_stack_boundary
> + < (current_function_is_leaf
> + ? crtl->stack_alignment_used
> + : crtl->stack_alignment_needed));
> +
> + if (crtl->stack_realign_finalized)
> + {
> + /* After stack_realign_really is finalized, we can't no longer
> + change it. */
> + gcc_assert (crtl->stack_realign_really == stack_realign);
> + }
> + else
> + {
> + crtl->stack_realign_really = stack_realign;
> + crtl->stack_realign_finalized = true;
> + }
>
> We probably want to track the called function alignment here.
Joey, Xuepeng, can you look into it?
> However, stack_realign_really/stack_realign_finalized seems to be only
> used by i386 bits. If so, it ought to go to the machine specific cfun
> (or static variable)
>
[hjl@gnu-6 gcc]$ grep stack_realign_finalized *.c
function.c: gcc_assert (!crtl->stack_realign_finalized);
function.c: gcc_assert (!crtl->stack_realign_finalized
stack_realign_really is an optimization to avoid stack alignment
when it isn't needed. We can move it to the machine specific
cfun. But we feel it is useful for other targets when they
implement stack realignment.
Thanks.
H.J.