[PATCH 3/4] split-stack for powerpc64
David Edelsohn
dje.gcc@gmail.com
Tue May 19 14:37:00 GMT 2015
On Sun, May 17, 2015 at 10:54 PM, Alan Modra <amodra@gmail.com> wrote:
> This patch adds -fsplit-stack support for PowerPC64 Linux. I haven't
> made any real attempt to support ppc32 at this stage, but that should
> mostly be a matter of writing __morestack for ppc32.
>
> The idea of split-stack is to allocate just enough stack to execute a
> function, with checks added before function entry and on alloca to
> ensure the stack is large enough. It stack size is insufficient, a
> new stack segment is allocated for the function. The new stack and
> old stack are not necessarily contiguous. For powerpc64, function
> arguments on the old stack are accessed by using an arg_pointer
> register rather than accessing them relative to the stack pointer or
> frame pointer as is usually done. (x86 copies function arguments from
> the old stack to the new, but needs an arg pointer for variable
> argument lists.) Unwinding is handled by a personality routine that
> knows how to find stack segments.
>
> Split-stack prologue on function entry (local entry point for ELFv2)
> is as follows. This goes before the usual function prologue.
>
> entry:
> ld %r0,-0x7000-64(%r13) # tcbhead_t.__private_ss
> addis %r12,%r1,-allocate@ha
> addi %r12,%r12,-allocate@l
> cmpld %cr7,%r12,%r0
> bge+ %cr7,enough
> mflr %r0
> std %r0,16(%r1)
> bl __morestack
> ld %r0,16(%r1)
> mtlr %r0
> blr
> enough:
> # usual function prologue, modified a little at the end to set up the
> # arg_pointer in %r12, starts here. The arg_pointer is initialized,
> # if it is used, with
> addi %r12,%r1,frame_size
> bge %cr7,.+8
> mr %r12,%r29
>
> Notes:
> 1) A function that does not allocate a stack frame, does not have a
> split-stack prologue.
>
> 2) __morestack must be local. __morestack has a non-standard calling
> convention, with the desired stack being passed in %r12. It saves arg
> passing regs, calls __generic_morestack to allocate a new stack
> segment, restores the arg passing regs and sets r29 to point at the
> old stack, then calls its return address + 12 to execute the function.
> After the function returns __morestack saves return regs, calls
> __generic_releasestack, and returns to the split-stack prologue, which
> immediately returns. This scheme keeps hardware return prediction
> valid. __morestack must also ensure cr7 is correctly set.
>
> 3) Basic-block reordering (enabled with -O2) will move the six
> instructions after the "bge+" out of line.
>
> 4) When the stack allocation is less than 32k these two instructions
> addis %r12,%r1,-allocate@ha
> addi %r12,%r12,-allocate@l
> are rewritten as
> addi %r12,%r1,-allocate
> nop
> The addi may also be rewritten as a nop in the rare case that the
> stack allocation is exactly a multiple of 64k.
>
> 5) When the linker detects a call from split-stack to non-split-stack
> code, it adds 16k (or more) to the value found in "allocate"
> instructions. So non-split-stack code gets a larger stack. The
> amount is tunable by a linker option. The edit means powerpc64 does
> not need to implement __morestack_non_split, necessary on x86 because
> insufficient space is available there to edit the stack comparison
> code. This feature is only implemented in the GNU gold linker.
>
> 6) We won't handle >2G stack initially and perhaps never. Supporting
> multiple threads each requiring more than 2G of stack is probably not
> that important, and likely to OOM at run time. (It would be possible
> to easily handle up to 4G by rounding the allocation up to a multiple
> of 64k and using two addis instructions in the split-stack prologue.)
>
> 7) If __morestack is called, then there are two stack frames between
> the function and its caller. Immediately above is a small 32 byte
> frame on the new stack, there so that a back-chain is always present
> no matter the value of r1. This could be reduced to 16 bytes but I
> thought it better to waste a few bytes for 32-byte alignment in case
> powerpc64 goes to 32-byte aligned stacks. Above that frame is the
> __morestack frame on the old stack.
>
> 8) If the normal function prologue uses r12 as a frame pointer, as it
> always does when the frame size is larger than 32k, then the arg
> pointer is set up with
> addi %r12,%r12,to_top_of_frame
> bge %cr7,.+8
> mr %r12,%r29
> omitting the addi if to_top_of_frame is zero.
>
> gcc/
> * common/config/rs6000/rs6000-common.c (TARGET_SUPPORTS_SPLIT_STACK):
> Define.
> (rs6000_supports_split_stack): New function.
> * gcc/config/rs6000/rs6000.c (machine_function): Add
> split_stack_arg_pointer.
> (TARGET_EXTRA_LIVE_ON_ENTRY, TARGET_INTERNAL_ARG_POINTER): Define.
> (setup_incoming_varargs): Use crtl->args.internal_arg_pointer
> rather than virtual_incoming_args_rtx.
> (rs6000_va_start): Likewise.
> (split_stack_arg_pointer_used_p): New function.
> (rs6000_emit_prologue): Set up arg pointer for -fsplit-stack.
> (morestack_ref): New var.
> (gen_add3_const, rs6000_expand_split_stack_prologue,
> rs6000_internal_arg_pointer, rs6000_live_on_entry,
> rs6000_split_stack_space_check): New functions.
> (rs6000_elf_file_end): Call file_end_indicate_split_stack.
> * gcc/config/rs6000/rs6000.md (UNSPEC_STACK_CHECK): Define.
> (UNSPECV_SPLIT_STACK_RETURN): Define.
> (split_stack_prologue, load_split_stack_limit,
> load_split_stack_limit_di, load_split_stack_limit_si,
> split_stack_return, split_stack_space_check): New expands and insns.
> * gcc/config/rs6000/rs6000-protos.h
> (rs6000_expand_split_stack_prologue): Declare.
> (rs6000_split_stack_space_check): Declare.
> libgcc/
> * config/rs6000/morestack.S: New.
> * config/rs6000/t-stack-rs6000: New.
> * config.host (powerpc*-*-linux*): Add t-stack and t-stack-rs6000
> to tmake_file.
> * generic-morestack.c: Don't build for powerpc 32-bit.
This patch is okay.
I'll let you and Lynn discuss the meaning of options.
Thanks, David
More information about the Gcc-patches
mailing list