This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [RFA]: Merge stack alignment branch


Andrew,

I reviewed your patch. If understood correctly it generates additional
instructions for each stack variable that need bigger alignment. It may
not be the optimized way in case of frequent access. 

This patch uses as less as one additional instruction to align the frame
for current function. It is more efficient in most of cases. As to the
change to target specific code, I think the benefit can justify the
effort.

Thanks - Joey 

-----Original Message-----
From: Andrew Pinski [mailto:pinskia@gmail.com] 
Sent: Friday, April 04, 2008 2:34 PM
To: Ye, Joey
Cc: GCC Patches; Lu, Hongjiu; Guo, Xuepeng; <ubizjak@gmail.com>
Subject: Re: [RFA]: Merge stack alignment branch



Sent from my iPhone

On Apr 3, 2008, at 23:23, "Ye, Joey" <joey.ye@intel.com> wrote:

> STACK branch has been created for a while and a bunch of patches to
> implement stack alignment for i386/x86_64 have been checked in. Now  
> this
> branch not only can support all stack variables to be aligned at their
> required boundary effectively, but also introduce zero regression
> against current trunk. Here is the background information and the  
> patch.
> Comments and feedback are high appreciated.


Why not align the variables that need the extra alignment?  This seems  
simpler and gets rid of the need for target specific changes. I  
already posted a patch to do it that way. It was created to support  
the Cell proccesor. We really need variables which have alignment of  
128 byte as the DMA will only work on memory that is 128 byte aligned  
if the size is greater than or equal to 128. Also it is not the common  
case that we need the extra alignment.

>
>
> -- BACKGROUD --
> Here, we propose a new design to fully support stack alignment while
> overcoming above problems. The new design will
> *  Support arbitrary alignment value, including 4,8,16,32...
> *  Adjust function stack alignment only when necessary
> *  Initial development will be on i386 and x86_64, but can be extended
> to other platforms
> *  Emit efficient prologue/epilogue code for stack align
> *  Coexist with special features like dynamic stack allocation  
> (alloca),
> nested functions, register parameter passing, PIC code and tail call
> optimization, etc
> *  Be able to debug and unwind stack
>
> 2.1 Support arbitrary alignment value
> Different source code and optimizations requires different stack
> alignment,
> as in following table:
> Feature         Alignment (bytes)
> i386_ABI        4
> x86_64_ABI      16
> char            1
> short           2
> int             4
> long            4/8*
> long long       8
> __m64           8
> __m128          16
> float           4
> double          8
> long double     16
> user specified  any power of 2
>
> *Note: 4 for i386, 8 for x86_64
> The new design will support any alignment value in this table.
>
> 2.2 Adjust function stack alignment only when necessary
>
> Current GCC defines following macros related to stack alignment:
> i. STACK_BOUNDARY in bits, which is preferred by hardware, 32 for i386
> and
> 64 for x86_64. It is the minimum stack boundary. It is fixed.
> ii. PREFERRED_STACK_BOUNDARY. It sets the stack alignment when  
> calling a
> function. It may be set at command line and has no impact on stack
> alignment at function entry. This proposal requires PREFERRED >=  
> STACK,
> and
> by default set to ABI_STACK_BOUNDARY
>
> This design will define a few more macros, or concepts not explicitly
> defined in code:
> iii. ABI_STACK_BOUNDARY in bits, which is the stack boundary specified
> by
> psABI, 32 for i386 and 128 for x86_64.  ABI_STACK_BOUNDARY >=
> STACK_BOUNDARY. It is fixed for a given psABI.
> iv. LOCAL_STACK_BOUNDARY in bits. Each function stack has its own  
> stack
> alignment requirement, which depends the alignment of its stack
> variables,
> LOCAL_STACK_BOUNDARY = MAX (alignment of each effective stack  
> variable).
> v. INCOMING_STACK_BOUNDARY in bits, which is the stack boundary at
> function
> entry. If a function is marked with __attribute__
> ((force_align_arg_pointer))
> or -mstackrealign option is provided, INCOMING = STACK_BOUNDARY.
> Otherwise,
> INCOMING == PREFERRED_STACK_BOUNDARY because a function is typically
> called
> locally with the same PREFERRED_STACK_BOUNDARY. For those function  
> whose
>
> PREFERRED is larger than ABI, it is the caller's responsibility to
> invoke
> them with appropriate PREFERRED.
> vi. REQUIRED_STACK_ALIGNMENT in bits, which is stack alignment  
> required
> by
> local variables and calling other function. REQUIRED_STACK_ALIGNMENT  
> ==
> MAX(LOCAL_STACK_BOUNDARY,PREFERRED_STACK_BOUNDARY) in case of a non- 
> leaf
> function. For a leaf function, REQUIRED_STACK_ALIGNMENT ==
> MAX(LOCAL_STACK_BOUNDARY,STACK_BOUNDARY).
>
> This proposal won't adjust stack when INCOMING_STACK_BOUNDARY >=
> REQUIRED_STACK_ALIGNMENT. Only when INCOMING_STACK_BOUNDARY <
> REQUIRED_STACK_ALIGNMENT, or PREFERRED_STACK_BOUNDARY of entry  
> function
> less
> than ABI_STACK_BOUNDARY, it will adjust stack to
> REQUIRED_STACK_ALIGNMENT
> at prologue.
>
> 2.3 Initial development on i386 and x86_64
> We initially support i386 and x86_64. In this document we focus more  
> on
> i386 because it is hard to implement because of the restriction of
> having
> a small register file.  But all that we discuss can be easily applied
> to x86_64.
>
> 2.4 Emit more efficient prologue/epilogue
> When a function needs to adjust stack alignment and has no dynamic  
> stack
> allocation, this design will generate following example
> prologue/epilogue
> code:
> IA32 example Prologue:
>        pushl     %ebp
>        movl      %esp, %ebp
>        andl      $-16, %esp
>        subl      $4, %esp ; is $-4 the local stack size?
> Epilogue:
>        movl      %ebp, %esp
>        popl      %ebp
>        ret
> Locals will be addressed as esp + offset and parameters as ebp +  
> offset.
>
> Add x86_64 example here.
>
> Thus BP points to parameter frame and SP points to local frame.
>
> 2.5 Coexist with special features
> Stack alignment adjustment will coexist with varying  GCC features
> that have special calling conventions and frame layout, such as  
> dynamic
> stack allocation (alloca), nested functions and parameter passing via
> registers to local functions.
>
> I386 hard register usage is the major problem to make the proposal
> friendly
> to various GCC features. This design requires an additional hard
> register
> in prologue/epilogue in case of dynamic stack allocation. The register
> is
> called as Dynamic Realigned Argument Pointer, or DRAP. Because I386  
> PIC
> requires BX as GOT pointer and I386 may use AX, DX and CX as parameter
> passing registers, also it has to work with setjmp/longjmp, there are
> limited candidates to choose.  Current proposal uses CX as DRAP if  
> CX is
> not
> used byr to pass parameter. If CX is not available DI will be used
> because
> it is preserved across setjmp/longjmp since it is callee-saved.
>
> X86_64 is much easier. This proposal just chooses R12 as DRAP, which  
> is
> also preserved across setjmp/longjmp since it is callee-saved.
>
> DRAP will be assigned to a virtual register, or VDRAP, in prologue so
> that
> DRAP hard register itself can be free for register allocator in  
> function
> body.
> Usually VDRAP will be allocated as the same DRAP register, thus the
> additional
> register move instruction is oftenly removed.
>
> 2.5.1 When stack alignment adjustment comes together with alloca,
> following
> example prologue/epilogue will be emitted:
> Prologue:
>       pushl     %edi                     // Save callee save reg edi
>       leal      8(%esp), %edi            // Save address of parameter
> frame
>       andl      $-16, %esp               // Align local stack
>
> //  Reserve two stack slots and save return address
> //  and previous frame pointer into them. By
> //  pointing new ebp to them, we build a pseudo
> //  stack for unwinding.
>       pushl     $4(%edi)                 //  save return address
>       pushl     %ebp                     //  save old ebp
>       movl      %esp, %ebp               //  point ebp to pseudo frame
> start
>
>       subl      $24, %esp                // adjust local frame size
>       movl      %edi, vreg1
>
> epilogue:
>       movl      vreg1, %edi
>       movl      %ebp, %esp               // Restore esp to pseudo  
> frame
> start
>       popl      %ebp
>       leal      -8(%edi), %esp           // restore esp to real frame
> start
>       popl      %edi                     // Restore edi
>       ret
>
> Locals will be addressed as ebp - offset, parameters as vreg1 + offset
>
> Where BX is used to set up virtual parameter frame pointer, BP  
> points to
> local frame and SP points to dynamic allocation frame.
>
> 2.5.2 Nested functions will automatically work because it uses CX as
> static
> pointer, which won't conflict with any registers used by stack  
> alignment
> adjustment, even when nested functions are called via function pointer
> and
> a function stub on stack.
>
> 2.5.3 GCC may optimize to use registers to pass parameters . At most  
> AX,
> DX
> and CX will be used. Such optimization won't conflict with stack
> alignment
> adjustment thus it should automatically work.
>
> 2.5.4 I386 PIC uses an available register or EBX as GOT pointer. This
> design
> work well under i386 PIC. When picking up a register for PIC, we will
> avoid
> using the DRAP register:
>
> For example:
> i686 Prologue:
>        pushl     %edi
>        leal      8(%esp), %edi
>        andl      $-16, %esp
>        pushl     $4(%edi)
>        pushl     %ebp
>        movl      %esp, %ebp
>        subl      $24,  %esp
>        call      .L1
> .L1:
>        popl      %ebx
>        movl      %edi, vreg1
>
> Body:  // code for alloca
>        movl      (vreg1), %eax
>        subl      %eax, %esp
>        andl      $-16, %esp
>        movl      %esp, %eax
>
> i686 Epilogue:
>        movl      %ebp, %esp
>        popl      %ebp
>        leal      -8(%edi), %esp
>        popl      %edi
>        ret
>
> Locals will be addressed as ebp - offset, parameters as vreg1 +  
> offset,
> ebx has the GOT pointer.
>
> 2.6 Debug and unwind will work since DWARF2 has the flexibility to
> define
> different frame pointers.
>
> 2.7 Some intrinsics rely on stack layout. Need to handle them
> accordingly.
> They are __builtin_return_address, __builtin_frame_address. This
> proposal
> will setup pseudo frame slot to help unwinder find return address and
> parent frame address by emit following prologue code after adjusting
> alignment:
>        pushl     $4(%edi)
>        pushl     %ebp
>
> ChangeLog:
> 2008-04-04  Uros Bizjak  <ubizjak@gmail.com>
>        H.J. Lu  <hongjiu.lu@intel.com>
>
>    PR target/12329
>    * config/i386/i386.c (ix86_function_regparm): Limit the number
> of
>    register passing arguments to 2 for nested functions.
>
> 2008-04-04  Joey Ye  <joey.ye@intel.com>
>        H.J. Lu  <hongjiu.lu@intel.com>
>        Xuepeng Guo  <xuepeng.guo@intel.com>
>
>    * builtins.c (expand_builtin_setjmp_receiver): Replace
>    virtual_incoming_args_rtx with
>    current_function_internal_arg_pointer.
>    (expand_builtin_apply_args_1): Likewise.
>
>    * calls.c (expand_call): Don't calculate preferred stack
>    boundary according to incoming stack boundary. Replace
>    virtual_incoming_args_rtx with
>    current_function_internal_arg_pointer.
>
>    * cfgexpand.c (get_decl_align_unit): Estimate stack variable
>    alignment and store to stack_alignment_estimated and
>    stack_alignment_used.
>    (expand_one_var): Likewise.
>    (gate_stack_realign): Gate new pass
> pass_collect_stackrealign_info
>    and pass_handle_drap.
>    (collect_stackrealign_info): Execute new pass
>    pass_collect_stackrealign_info.
>    (pass_collect_stackrealign_info): Define new pass.
>    (handle_drap): Execute new pass pass_handle_drap.
>    (pass_handle_drap): Define new pass.
>
>    * defaults.h (MAX_VECTORIZE_STACK_ALIGNMENT): New.
>
>    * dojump.c (clear_pending_stack_adjust): Leave an FIXME in
>    comments in case pending stack ajustment is discard when stack
>    realign is needed.
>
>    * flags.h (frame_pointer_needed): Removed.
>    * final.c (frame_pointer_needed): Likewise.
>
>    * function.c (assign_stack_local_1): Estimate stack variable
>    alignment and store to stack_alignment_estimated.
>    (instantiate_new_reg): Instantiate virtual incoming args rtx to
>    vDRAP if stack realignment and DRAP is needed.
>    (assign_parms): Collect parameter/return type alignment and
>    contribute to stack_alignment_estimated.
>    (locate_and_pad_parm): Likewise.
>    (allocate_struct_function): Init stack_alignment_estimated and
>    stack_alignment_used.
>    (get_arg_pointer_save_area): Replace virtual_incoming_args_rtx
>    with current_function_internal_arg_pointer.
>
>    * function.h (function): Add drap_reg,
> stack_alignment_estimated,
>    need_frame_pointer, need_frame_pointer_set,
> stack_realign_needed,
>    stack_realign_really, need_drap, save_param_ptr_reg,
>    stack_realign_processed, stack_realign_finalized and
>    stack_realign_used.
>    (frame_pointer_needed): New.
>    (stack_realign_fp): Likewise.
>    (stack_realign_drap): Likewise.
>
>    * global.c (compute_regsets): Set frame_pointer_needed
> cannot_elim
>    wrt stack_realign_needed.
>
>    * stmt.c (expand_nl_goto_receiver): Replace
>    virtual_incoming_args_rtx with
>    current_function_internal_arg_pointer.
>
>    * passes.c (pass_collect_stackrealign_info): Insert this new
> pass
>    immediately before expand.
>    (pass_handle_drap): Insert this new pass immediately after
> expand.
>
>    * tree-inline.c (expand_call_inline): Estimate stack variable
>    alignment and store to stack_alignment_estimated.
>
>    * tree-pass.h (pass_handle_drap): New.
>    (pass_collect_stackrealign_info): Likewise.
>
>    * tree-vectorizer.c (vect_can_force_dr_alignment_p): Estimate
>    stack variable alignment and store to stack_alignment_estimated.
>
>    * reload1.c (set_label_offsets): Assert that frame pointer must
> be
>    elimiated to stack pointer in case stack realignment is
> estimated
>    to happen without DRAP.
>    (elimination_effects): Likewise.
>    (eliminate_regs_in_insn): Likewise.
>    (mark_not_eliminable): Likewise.
>    (update_eliminables): Frame pointer is needed in case of stack
>    realignment needed.
>    (init_elim_table): Don't set frame_pointer_needed here.
>
>    * dwarf2out.c (CUR_FDE): New.
>    (reg_save_with_expression): Likewise.
>    (dw_fde_struct): Add drap_regnum, stack_realignment,
>    is_stack_realign, is_drap and is_drap_reg_saved.
>    (add_cfi): If stack is realigned, call reg_save_with_expression
>    to represent the location of stored vars.
>    (dwarf2out_frame_debug_expr): Add rules 16-19 to handle stack
>    realign.
>    (output_cfa_loc): Handle DW_CFA_expression.
>    (based_loc_descr): Update assert for stack realign.
>
>    * config/i386/i386.c (ix86_force_align_arg_pointer_string):
> Break
>    long line.
>    (ix86_user_incoming_stack_boundary): New.
>    (ix86_default_incoming_stack_boundary): Likewise.
>    (ix86_incoming_stack_boundary): Likewise.
>    (find_drap_reg): Likewise.
>    (override_options): Overide option value for new options.
>    (ix86_function_ok_for_sibcall): Sibcall is OK even stack need
>    realigning.
>    (ix86_handle_cconv_attribute): Stack realign no longer impacts
>    number of regparm.
>    (ix86_function_regparm): Likewise.
>    (setup_incoming_varargs_64): Remove the logic to set
>    stack_alignment_needed here.
>    (ix86_va_start): Replace virtual_incoming_args_rtx with
>    current_function_internal_arg_pointer.
>    (ix86_save_reg): Replace force_align_arg_pointer with drap_reg.
>    (ix86_compute_frame_layout): Compute frame layout wrt stack
>    realignment.
>    (ix86_internal_arg_pointer): Estimate if stack realignment is
>    needed and returns appropriate arg pointer rtx accordingly.
>    (ix86_expand_prologue): Finally decide if stack realignment
>    is needed and generate prologue code accordingly.
>    (ix86_expand_epilogue): Generate epilogue code wrt stack
>    realignment is really needed or not.
>    * config/i386/i386.c (ix86_select_alt_pic_regnum): Check
>    DRAP register.
>
>    * config/i386/i386.h (MAIN_STACK_BOUNDARY): New.
>    (ABI_STACK_BOUNDARY): Likewise.
>    PREFERRED_STACK_BOUNDARY_DEFAULT): Likewise.
>    (STACK_REALIGN_DEFAULT): Likewise.
>    (INCOMING_STACK_BOUNDARY): Likewise.
>    (MAX_VECTORIZE_STACK_ALIGNMENT): Likewise.
>    (ix86_incoming_stack_boundary): Likewise.
>    (REAL_PIC_OFFSET_TABLE_REGNUM): Updated to use BX_REG.
>    (CAN_ELIMINATE): Redefine the macro to eliminate frame pointer
> to
>    stack pointer and arg pointer to hard frame pointer
> <merge-stack-0404.patch>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]