This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RE: [PATCH 4/6]: Ping: Merge from Stack Branch - i386 backend changes
- From: "Ye, Joey" <joey dot ye at intel dot com>
- To: "Ye, Joey" <joey dot ye at intel dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, "Jan Hubicka" <jh at suse dot cz>
- Cc: "GCC Patches" <gcc-patches at gcc dot gnu dot org>, "Jan Hubicka" <hubicka at ucw dot cz>, "Guo, Xuepeng" <xuepeng dot guo at intel dot com>
- Date: Sun, 4 May 2008 19:38:29 +0800
- Subject: RE: [PATCH 4/6]: Ping: Merge from Stack Branch - i386 backend changes
- References: <BB577BF501703042AD7E08EADD8E514FF5DC8F@pdsmsx415.ccr.corp.intel.com> <20080430152008.GZ932@kam.mff.cuni.cz> <6dc9ffc80804300907t610671b6wf2952e4ac7c964f8@mail.gmail.com>
Jan,
Further comments wrt?
> Note that cgraph_rtl_info is tracking stack alignments needed by
> functions already compiled, so we ought to be able to do better here
and
> only upgrade it to the largest alignment needed by function called.
We ignore cgraph_rtl_info based on one of the principle of this scheme:
caller's stack alignment doesn't depense on callee's. So no matter what
the alignment required of compiled callee function is, caller just
ignore it.
Using following example to explain this principle:
foo () { bar (); }
Case 0 (our principle): Foo aligns its own stack with it's own
requirement, so does bar . Foo needn't consider bar's requirement,
because bar will take care of itself.
Case 1: Both foo and bar consider callee's alignment requirement: waste
of instructions
Case 2: Foo take care of bar's alignment requirement, and bar doesn't
realign with the assumption that foo has done for it: WRONG! Bar can't
guarantee that only foo can call it. If another function zoo calls bar,
it may result in less than required alignment.
Case 0 is the most reasonable and efficient solution we are using now.
Thanks - Joey
-----Original Message-----
From: Ye, Joey
Sent: Thursday, May 01, 2008 3:49 AM
To: 'H.J. Lu'; 'Jan Hubicka'
Cc: 'GCC Patches'; 'Jan Hubicka'; Guo, Xuepeng
Subject: RE: [PATCH 4/6]: Ping: Merge from Stack Branch - i386 backend
changes
Jan, Thanks much for you comments. See my additional replies:
>> +
>> +/* 1 if -mstackrealign should be turned on by default. It will
>> + generate an alternate prologue and epilogue that realigns the
>> + runtime stack if nessary. This supports mixing codes that keep a
>> + 4-byte aligned stack, as specified by i386 psABI, with codes that
>> + need a 16-byte aligned stack, as required by SSE instructions.
If
>> + STACK_REALIGN_DEFAULT is 1 and PREFERRED_STACK_BOUNDARY_DEFAULT
is
>> + 128, stacks for all functions may be realigned. */
>>
>> I don't see why you wnat to realign all functions here?
>-mstackrealign is added by Apple. We replaced the old
>implementation, which has many limitations, with ours.
-mstackrealign can be used a test purpose. It can also force all
function to assume incoming boundary is psABI default. After all we are
setting PREFERRED_STACK_BOUNDARY to 128 bits by default. Refer to this
thread: http://gcc.gnu.org/ml/gcc/2007-12/msg00515.html
>>
>> #define CAN_ELIMINATE(FROM, TO) \
>> - ((TO) == STACK_POINTER_REGNUM ? !frame_pointer_needed : 1)
>> + (stack_realign_fp \
>> + ? ((FROM) == ARG_POINTER_REGNUM && (TO) ==
HARD_FRAME_POINTER_REGNUM) \
>> + || ((FROM) == FRAME_POINTER_REGNUM && (TO) ==
STACK_POINTER_REGNUM) \
>> + : ((TO) == STACK_POINTER_REGNUM ? !frame_pointer_needed : 1))
>>
>>
>> This can be moved offline..
Do you mean move into a function?
> +/* Find an available register to be used as dynamic realign argument
> + pointer regsiter. Such a register will be written in prologue
and
> + used in begin of body, so it must not be
> + 1. parameter passing register.
> + 2. GOT pointer.
> + For i386, we use CX if it is not used to pass parameter.
Otherwise
> + we just pick DI.
> + For x86_64, we just pick R13 directly.
>
> For x86_64, using lower reg should save code size.
There is no much choice left from RAX to RSI. RAX for number of SSE
parameter in vararg. RBX for base pointer. RCX to RSI for parameter 1 to
6. Simply picking R13 is easy to implement. I agree that is room to
improve for example use RBX in case no base pointer etc. That can be
left for future improvement. After all, code size will increase a little
only when both more than 16 bytes alignment and DRAP are needed.
> I am confused how it
> is used by the body. I tought it is live only across the
prologue/begin
> of function boundary, otherwise that drap is a pseudo.
Yes it is live only across the prologue. But setting GOT pointer also
happens in prologue, also parameters are alive in prologue. So DRAP
register can't conflict with them.
> Fixing CX/DI for whole body wuld definitly lead to issues with asm.
I don't have an idea where where issue will be. Can you give more hints?
From Jan:
>> +
>> + /* Update crtl->stack_alignment_estimated and use it later to
align
>> + stack. FIXME: How to optimize for leaf function? */
>>
>> What is the deal here? Can't we simply check if the function is
leaf
>> and void bumping the alignments up?
From HJ
> Joey, correct me if I am wrong. I don't think leaf function
information
> is available when this function is called.
Correct. Flag current_function_is_leaf is set in pass_local_alloc. So we
don't have that information in expand.
From Jan:
> Note that cgraph_rtl_info is tracking stack alignments needed by
> functions already compiled, so we ought to be able to do better here
and
> only upgrade it to the largest alignment needed by function called.
Let me try if I can use it. Though it doesn't guarantee to have valid
information in case function hasn't been compiled.
From Jan:
> + if (crtl->stack_realign_finalized)
> + {
> + /* After stack_realign_really is finalized, we can't no longer
> + change it. */
> + gcc_assert (crtl->stack_realign_really == stack_realign);
> + }
> + else
> + {
> + crtl->stack_realign_really = stack_realign;
> + crtl->stack_realign_finalized = true;
> + }
>
> We probably want to track the called function alignment here.
Not sure I understand your point. You mean print the alignment to log
file?
Thanks - Joey
-----Original Message-----
From: H.J. Lu [mailto:hjl.tools@gmail.com]
Sent: Thursday, May 01, 2008 12:08 AM
To: Jan Hubicka
Cc: Ye, Joey; GCC Patches; Jan Hubicka; Guo, Xuepeng
Subject: Re: [PATCH 4/6]: Ping: Merge from Stack Branch - i386 backend
changes
Hi Jan,
Thanks for reviewing it. Joey and Xuepeng are on vacation until next
Monday.
Their replies may be delayed.
On Wed, Apr 30, 2008 at 8:20 AM, Jan Hubicka <jh@suse.cz> wrote:
>
> Hi,
>
> Index: config/i386/i386.h
> ===================================================================
> --- config/i386/i386.h (revision 134716)
> +++ config/i386/i386.h (working copy)
> @@ -806,16 +806,32 @@ enum target_cpu_default
> /* Boundary (in *bits*) on which stack pointer should be aligned.
*/
> #define STACK_BOUNDARY BITS_PER_WORD
>
> +/* Stack boundary of the main function guaranteed by OS. */
> +#define MAIN_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32)
> +
> +/* Stack boundary guaranteed by ABI. */
> +#define ABI_STACK_BOUNDARY (TARGET_64BIT ? 128 : 32)
> +
> /* Boundary (in *bits*) on which the stack pointer prefers to be
> aligned; the compiler cannot rely on having this alignment. */
> #define PREFERRED_STACK_BOUNDARY ix86_preferred_stack_boundary
>
> -/* As of July 2001, many runtimes do not align the stack properly
when
> - entering main. This causes expand_main_function to forcibly
align
> - the stack, which results in aligned frames for functions called
from
> - main, though it does nothing for the alignment of main itself.
*/
> -#define FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN \
> - (ix86_preferred_stack_boundary > STACK_BOUNDARY && !TARGET_64BIT)
> +/* It should be ABI_STACK_BOUNDARY. But we set it to 128 bits for
> + both 32bit and 64bit, to support codes that need 128 bit stack
> + alignment for SSE instructions, but can't realign the stack. */
> +#define PREFERRED_STACK_BOUNDARY_DEFAULT 128
>
> So basically you are dropping the main function codegen hack, but
> keeping preferred stack boundary for 128 and force realignment by
> default, right?
Yes. We align the stack for main like any other functions. The
difference is for main, the incoming stack boundary is determined
by MAIN_STACK_BOUNDARY and for other functions, the incoming
stack boundary is set by PREFERRED_STACK_BOUNDARY_DEFAULT.
On ia32, since by default, MAIN_STACK_BOUNDARY is less than
PREFERRED_STACK_BOUNDARY_DEFAULT, we will align
stack for main automatically.
>
> I guess we either want to keep PREFERRED_STACK_BOUNDARY bumped up so
> code is compatible with what older GCC generate, but then we want to
> keep the FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN. Or we want to drop
> preffered stack boundary by default to get smaller and faster code
when
> alginment is not needed.
FORCE_PREFERRED_STACK_BOUNDARY_IN_MAIN isn't needed
since stack alignment in main is handled properly according its
incoming and outgoing stack boundaries. as well as alignments
of variables on stack. We added 2 testcases:
gcc.target/i386/align-main-1.c
gcc.target/i386/align-main-2.c
to make sure that we don't align stack for main twice when
there are variables on stack with alignment requirement.
> +
> +/* 1 if -mstackrealign should be turned on by default. It will
> + generate an alternate prologue and epilogue that realigns the
> + runtime stack if nessary. This supports mixing codes that keep a
> + 4-byte aligned stack, as specified by i386 psABI, with codes that
> + need a 16-byte aligned stack, as required by SSE instructions.
If
> + STACK_REALIGN_DEFAULT is 1 and PREFERRED_STACK_BOUNDARY_DEFAULT
is
> + 128, stacks for all functions may be realigned. */
>
> I don't see why you wnat to realign all functions here?
-mstackrealign is added by Apple. We replaced the old
implementation, which has many limitations, with ours.
>
> #define CAN_ELIMINATE(FROM, TO) \
> - ((TO) == STACK_POINTER_REGNUM ? !frame_pointer_needed : 1)
> + (stack_realign_fp \
> + ? ((FROM) == ARG_POINTER_REGNUM && (TO) ==
HARD_FRAME_POINTER_REGNUM) \
> + || ((FROM) == FRAME_POINTER_REGNUM && (TO) ==
STACK_POINTER_REGNUM) \
> + : ((TO) == STACK_POINTER_REGNUM ? !frame_pointer_needed : 1))
>
>
> This can be moved offline..
What do you mean?
> +/* Find an available register to be used as dynamic realign argument
> + pointer regsiter. Such a register will be written in prologue
and
> + used in begin of body, so it must not be
> + 1. parameter passing register.
> + 2. GOT pointer.
> + For i386, we use CX if it is not used to pass parameter.
Otherwise
> + we just pick DI.
> + For x86_64, we just pick R13 directly.
>
> For x86_64, using lower reg should save code size. I am confused how
it
We want to use a temporary register to avoid RA problems. Lower
registers aren't free to use.
> is used by the body. I tought it is live only across the
prologue/begin
> of function boundary, otherwise that drap is a pseudo.
Joey, can you comment on this?
> Fixing CX/DI for whole body wuld definitly lead to issues with asm.
Do you have a testcase? I believe our approach should work with
asm statements.
> +
> + /* Update crtl->stack_alignment_estimated and use it later to
align
> + stack. FIXME: How to optimize for leaf function? */
>
> What is the deal here? Can't we simply check if the function is leaf
> and void bumping the alignments up?
Joey, correct me if I am wrong. I don't think leaf function information
is available when this function is called.
> Note that cgraph_rtl_info is tracking stack alignments needed by
> functions already compiled, so we ought to be able to do better here
and
> only upgrade it to the largest alignment needed by function called.
Joey, Xuepeng, can you look into it?
> This is how the original code was working BTW.
> + if (ix86_force_drap || !ACCUMULATE_OUTGOING_ARGS)
> + crtl->need_drap = true;
>
> We now have DRAP for all non-accumulate outgoing args subtargets?
Yes, only if stack alignment is needed.
>
> +/* Finalize stack_realign_really flag, which will guide
prologue/epilogue
> + to be generated in correct form. */
> +static void
> +ix86_finalize_stack_realign_flags (void)
> +{
> + /* Check if stack realign is really needed after reload, and
> + stores result in cfun */
> + unsigned int stack_realign = (ix86_incoming_stack_boundary
> + < (current_function_is_leaf
> + ? crtl->stack_alignment_used
> + : crtl->stack_alignment_needed));
> +
> + if (crtl->stack_realign_finalized)
> + {
> + /* After stack_realign_really is finalized, we can't no longer
> + change it. */
> + gcc_assert (crtl->stack_realign_really == stack_realign);
> + }
> + else
> + {
> + crtl->stack_realign_really = stack_realign;
> + crtl->stack_realign_finalized = true;
> + }
>
> We probably want to track the called function alignment here.
Joey, Xuepeng, can you look into it?
> However, stack_realign_really/stack_realign_finalized seems to be
only
> used by i386 bits. If so, it ought to go to the machine specific
cfun
> (or static variable)
>
[hjl@gnu-6 gcc]$ grep stack_realign_finalized *.c
function.c: gcc_assert (!crtl->stack_realign_finalized);
function.c: gcc_assert (!crtl->stack_realign_finalized
stack_realign_really is an optimization to avoid stack alignment
when it isn't needed. We can move it to the machine specific
cfun. But we feel it is useful for other targets when they
implement stack realignment.
Thanks.
H.J.