This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFA]: Merge stack alignment branch


STACK branch has been created for a while and a bunch of patches to
implement stack alignment for i386/x86_64 have been checked in. Now this
branch not only can support all stack variables to be aligned at their
required boundary effectively, but also introduce zero regression
against current trunk. Here is the background information and the patch.
Comments and feedback are high appreciated.

-- BACKGROUD --
Here, we propose a new design to fully support stack alignment while
overcoming above problems. The new design will
*  Support arbitrary alignment value, including 4,8,16,32...
*  Adjust function stack alignment only when necessary
*  Initial development will be on i386 and x86_64, but can be extended
to other platforms
*  Emit efficient prologue/epilogue code for stack align
*  Coexist with special features like dynamic stack allocation (alloca),
nested functions, register parameter passing, PIC code and tail call
optimization, etc
*  Be able to debug and unwind stack

2.1 Support arbitrary alignment value
Different source code and optimizations requires different stack
alignment,
as in following table:
Feature         Alignment (bytes)
i386_ABI        4
x86_64_ABI      16
char            1
short           2
int             4
long            4/8*
long long       8
__m64           8
__m128          16
float           4
double          8
long double     16
user specified  any power of 2

*Note: 4 for i386, 8 for x86_64
The new design will support any alignment value in this table.

2.2 Adjust function stack alignment only when necessary

Current GCC defines following macros related to stack alignment:
i. STACK_BOUNDARY in bits, which is preferred by hardware, 32 for i386
and
64 for x86_64. It is the minimum stack boundary. It is fixed.
ii. PREFERRED_STACK_BOUNDARY. It sets the stack alignment when calling a
function. It may be set at command line and has no impact on stack
alignment at function entry. This proposal requires PREFERRED >= STACK,
and
by default set to ABI_STACK_BOUNDARY

This design will define a few more macros, or concepts not explicitly
defined in code:
iii. ABI_STACK_BOUNDARY in bits, which is the stack boundary specified
by
psABI, 32 for i386 and 128 for x86_64.  ABI_STACK_BOUNDARY >=
STACK_BOUNDARY. It is fixed for a given psABI.
iv. LOCAL_STACK_BOUNDARY in bits. Each function stack has its own stack
alignment requirement, which depends the alignment of its stack
variables,
LOCAL_STACK_BOUNDARY = MAX (alignment of each effective stack variable).
v. INCOMING_STACK_BOUNDARY in bits, which is the stack boundary at
function
entry. If a function is marked with __attribute__
((force_align_arg_pointer))
or -mstackrealign option is provided, INCOMING = STACK_BOUNDARY.
Otherwise,
INCOMING == PREFERRED_STACK_BOUNDARY because a function is typically
called 
locally with the same PREFERRED_STACK_BOUNDARY. For those function whose

PREFERRED is larger than ABI, it is the caller's responsibility to
invoke 
them with appropriate PREFERRED.
vi. REQUIRED_STACK_ALIGNMENT in bits, which is stack alignment required
by
local variables and calling other function. REQUIRED_STACK_ALIGNMENT ==
MAX(LOCAL_STACK_BOUNDARY,PREFERRED_STACK_BOUNDARY) in case of a non-leaf
function. For a leaf function, REQUIRED_STACK_ALIGNMENT ==
MAX(LOCAL_STACK_BOUNDARY,STACK_BOUNDARY).

This proposal won't adjust stack when INCOMING_STACK_BOUNDARY >=
REQUIRED_STACK_ALIGNMENT. Only when INCOMING_STACK_BOUNDARY <
REQUIRED_STACK_ALIGNMENT, or PREFERRED_STACK_BOUNDARY of entry function
less 
than ABI_STACK_BOUNDARY, it will adjust stack to
REQUIRED_STACK_ALIGNMENT
at prologue.

2.3 Initial development on i386 and x86_64
We initially support i386 and x86_64. In this document we focus more on
i386 because it is hard to implement because of the restriction of
having
a small register file.  But all that we discuss can be easily applied
to x86_64.

2.4 Emit more efficient prologue/epilogue
When a function needs to adjust stack alignment and has no dynamic stack
allocation, this design will generate following example
prologue/epilogue
code:
IA32 example Prologue:
        pushl     %ebp
        movl      %esp, %ebp
        andl      $-16, %esp
        subl      $4, %esp ; is $-4 the local stack size?
Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        ret
Locals will be addressed as esp + offset and parameters as ebp + offset.

Add x86_64 example here.

Thus BP points to parameter frame and SP points to local frame.

2.5 Coexist with special features
Stack alignment adjustment will coexist with varying  GCC features
that have special calling conventions and frame layout, such as dynamic
stack allocation (alloca), nested functions and parameter passing via
registers to local functions.

I386 hard register usage is the major problem to make the proposal
friendly 
to various GCC features. This design requires an additional hard
register
in prologue/epilogue in case of dynamic stack allocation. The register
is 
called as Dynamic Realigned Argument Pointer, or DRAP. Because I386 PIC
requires BX as GOT pointer and I386 may use AX, DX and CX as parameter
passing registers, also it has to work with setjmp/longjmp, there are
limited candidates to choose.  Current proposal uses CX as DRAP if CX is
not 
used byr to pass parameter. If CX is not available DI will be used
because
it is preserved across setjmp/longjmp since it is callee-saved.

X86_64 is much easier. This proposal just chooses R12 as DRAP, which is
also preserved across setjmp/longjmp since it is callee-saved.

DRAP will be assigned to a virtual register, or VDRAP, in prologue so
that 
DRAP hard register itself can be free for register allocator in function
body.
Usually VDRAP will be allocated as the same DRAP register, thus the
additional
register move instruction is oftenly removed. 

2.5.1 When stack alignment adjustment comes together with alloca,
following
example prologue/epilogue will be emitted:
Prologue:
       pushl     %edi                     // Save callee save reg edi
       leal      8(%esp), %edi            // Save address of parameter
frame
       andl      $-16, %esp               // Align local stack

//  Reserve two stack slots and save return address 
//  and previous frame pointer into them. By
//  pointing new ebp to them, we build a pseudo 
//  stack for unwinding.
       pushl     $4(%edi)                 //  save return address
       pushl     %ebp                     //  save old ebp
       movl      %esp, %ebp               //  point ebp to pseudo frame
start

       subl      $24, %esp                // adjust local frame size
       movl      %edi, vreg1

epilogue:
       movl      vreg1, %edi
       movl      %ebp, %esp               // Restore esp to pseudo frame
start
       popl      %ebp
       leal      -8(%edi), %esp           // restore esp to real frame
start
       popl      %edi                     // Restore edi
       ret

Locals will be addressed as ebp - offset, parameters as vreg1 + offset

Where BX is used to set up virtual parameter frame pointer, BP points to
local frame and SP points to dynamic allocation frame.

2.5.2 Nested functions will automatically work because it uses CX as
static
pointer, which won't conflict with any registers used by stack alignment
adjustment, even when nested functions are called via function pointer
and
a function stub on stack.

2.5.3 GCC may optimize to use registers to pass parameters . At most AX,
DX
and CX will be used. Such optimization won't conflict with stack
alignment
adjustment thus it should automatically work.

2.5.4 I386 PIC uses an available register or EBX as GOT pointer. This
design
work well under i386 PIC. When picking up a register for PIC, we will
avoid
using the DRAP register:

For example:
i686 Prologue:
        pushl     %edi
        leal      8(%esp), %edi
        andl      $-16, %esp
        pushl     $4(%edi)
        pushl     %ebp
        movl      %esp, %ebp
        subl      $24,  %esp
        call      .L1
.L1:
        popl      %ebx
        movl      %edi, vreg1

Body:  // code for alloca
        movl      (vreg1), %eax
        subl      %eax, %esp
        andl      $-16, %esp
        movl      %esp, %eax

i686 Epilogue:
        movl      %ebp, %esp
        popl      %ebp
        leal      -8(%edi), %esp
        popl      %edi
        ret

Locals will be addressed as ebp - offset, parameters as vreg1 + offset,
ebx has the GOT pointer.

2.6 Debug and unwind will work since DWARF2 has the flexibility to
define
different frame pointers.

2.7 Some intrinsics rely on stack layout. Need to handle them
accordingly.
They are __builtin_return_address, __builtin_frame_address. This
proposal
will setup pseudo frame slot to help unwinder find return address and
parent frame address by emit following prologue code after adjusting
alignment:
        pushl     $4(%edi)
        pushl     %ebp

ChangeLog:
2008-04-04  Uros Bizjak  <ubizjak@gmail.com>
	    H.J. Lu  <hongjiu.lu@intel.com>

	PR target/12329
	* config/i386/i386.c (ix86_function_regparm): Limit the number
of
	register passing arguments to 2 for nested functions.

2008-04-04  Joey Ye  <joey.ye@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>
	    Xuepeng Guo  <xuepeng.guo@intel.com>

	* builtins.c (expand_builtin_setjmp_receiver): Replace
	virtual_incoming_args_rtx with
	current_function_internal_arg_pointer.
	(expand_builtin_apply_args_1): Likewise.

	* calls.c (expand_call): Don't calculate preferred stack
	boundary according to incoming stack boundary. Replace 
	virtual_incoming_args_rtx with
	current_function_internal_arg_pointer.

	* cfgexpand.c (get_decl_align_unit): Estimate stack variable
	alignment and store to stack_alignment_estimated and
	stack_alignment_used.
	(expand_one_var): Likewise.
	(gate_stack_realign): Gate new pass
pass_collect_stackrealign_info
	and pass_handle_drap.
	(collect_stackrealign_info): Execute new pass
	pass_collect_stackrealign_info.
	(pass_collect_stackrealign_info): Define new pass.
	(handle_drap): Execute new pass pass_handle_drap.
	(pass_handle_drap): Define new pass.

	* defaults.h (MAX_VECTORIZE_STACK_ALIGNMENT): New.

	* dojump.c (clear_pending_stack_adjust): Leave an FIXME in
	comments in case pending stack ajustment is discard when stack 
	realign is needed.

	* flags.h (frame_pointer_needed): Removed.
	* final.c (frame_pointer_needed): Likewise.

	* function.c (assign_stack_local_1): Estimate stack variable 
	alignment and store to stack_alignment_estimated.
	(instantiate_new_reg): Instantiate virtual incoming args rtx to
	vDRAP if stack realignment and DRAP is needed.
	(assign_parms): Collect parameter/return type alignment and 
	contribute to stack_alignment_estimated.
	(locate_and_pad_parm): Likewise.
	(allocate_struct_function): Init stack_alignment_estimated and
	stack_alignment_used.
	(get_arg_pointer_save_area): Replace virtual_incoming_args_rtx
	with current_function_internal_arg_pointer.

	* function.h (function): Add drap_reg,
stack_alignment_estimated,
	need_frame_pointer, need_frame_pointer_set,
stack_realign_needed,
	stack_realign_really, need_drap, save_param_ptr_reg,
	stack_realign_processed, stack_realign_finalized and 
	stack_realign_used.
	(frame_pointer_needed): New.
	(stack_realign_fp): Likewise.
	(stack_realign_drap): Likewise.

	* global.c (compute_regsets): Set frame_pointer_needed
cannot_elim
	wrt stack_realign_needed.

	* stmt.c (expand_nl_goto_receiver): Replace 
	virtual_incoming_args_rtx with
	current_function_internal_arg_pointer.

	* passes.c (pass_collect_stackrealign_info): Insert this new
pass
	immediately before expand.
	(pass_handle_drap): Insert this new pass immediately after
expand.

	* tree-inline.c (expand_call_inline): Estimate stack variable
	alignment and store to stack_alignment_estimated.

	* tree-pass.h (pass_handle_drap): New.
	(pass_collect_stackrealign_info): Likewise.

	* tree-vectorizer.c (vect_can_force_dr_alignment_p): Estimate
	stack variable alignment and store to stack_alignment_estimated.

	* reload1.c (set_label_offsets): Assert that frame pointer must
be
	elimiated to stack pointer in case stack realignment is
estimated
	to happen without DRAP.
	(elimination_effects): Likewise.
	(eliminate_regs_in_insn): Likewise.
	(mark_not_eliminable): Likewise.
	(update_eliminables): Frame pointer is needed in case of stack
	realignment needed.
	(init_elim_table): Don't set frame_pointer_needed here.

	* dwarf2out.c (CUR_FDE): New.
	(reg_save_with_expression): Likewise.
	(dw_fde_struct): Add drap_regnum, stack_realignment,
	is_stack_realign, is_drap and is_drap_reg_saved.
	(add_cfi): If stack is realigned, call reg_save_with_expression
	to represent the location of stored vars.
	(dwarf2out_frame_debug_expr): Add rules 16-19 to handle stack
	realign.
	(output_cfa_loc): Handle DW_CFA_expression.
	(based_loc_descr): Update assert for stack realign.

	* config/i386/i386.c (ix86_force_align_arg_pointer_string):
Break
	long line.
	(ix86_user_incoming_stack_boundary): New.
	(ix86_default_incoming_stack_boundary): Likewise.
	(ix86_incoming_stack_boundary): Likewise.
	(find_drap_reg): Likewise.
	(override_options): Overide option value for new options.
	(ix86_function_ok_for_sibcall): Sibcall is OK even stack need
	realigning.
	(ix86_handle_cconv_attribute): Stack realign no longer impacts
	number of regparm.
	(ix86_function_regparm): Likewise.
	(setup_incoming_varargs_64): Remove the logic to set
	stack_alignment_needed here.
	(ix86_va_start): Replace virtual_incoming_args_rtx with
	current_function_internal_arg_pointer.
	(ix86_save_reg): Replace force_align_arg_pointer with drap_reg.
	(ix86_compute_frame_layout): Compute frame layout wrt stack
	realignment.
	(ix86_internal_arg_pointer): Estimate if stack realignment is
	needed and returns appropriate arg pointer rtx accordingly.
	(ix86_expand_prologue): Finally decide if stack realignment
	is needed and generate prologue code accordingly.
	(ix86_expand_epilogue): Generate epilogue code wrt stack
	realignment is really needed or not.
	* config/i386/i386.c (ix86_select_alt_pic_regnum): Check
	DRAP register.
	
	* config/i386/i386.h (MAIN_STACK_BOUNDARY): New.
	(ABI_STACK_BOUNDARY): Likewise.
	PREFERRED_STACK_BOUNDARY_DEFAULT): Likewise.
	(STACK_REALIGN_DEFAULT): Likewise.
	(INCOMING_STACK_BOUNDARY): Likewise.
	(MAX_VECTORIZE_STACK_ALIGNMENT): Likewise.
	(ix86_incoming_stack_boundary): Likewise.
	(REAL_PIC_OFFSET_TABLE_REGNUM): Updated to use BX_REG.
	(CAN_ELIMINATE): Redefine the macro to eliminate frame pointer
to
	stack pointer and arg pointer to hard frame pointer in case of
	stack realignment without DRAP.
	(machine_function): Remove force_align_arg_pointer.

	* config/i386/i386.md (BX_REG): New.
	(R13_REG): Likewise.

	* config/i386/i386.opt (mforce_drap): New.
	(mincoming-stack-boundary): Likewise.
	(mstackrealign): Updated.

	* doc/extend.texi: Update force_align_arg_pointer.
	* doc/invoke.texi: Document -mincoming-stack-boundary.  Update
	-mstackrealign.

Attachment: merge-stack-0404.patch
Description: merge-stack-0404.patch


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]