[PATCH 3/4] split-stack for powerpc64

David Edelsohn dje.gcc@gmail.com
Tue May 19 14:37:00 GMT 2015


On Sun, May 17, 2015 at 10:54 PM, Alan Modra <amodra@gmail.com> wrote:
> This patch adds -fsplit-stack support for PowerPC64 Linux.  I haven't
> made any real attempt to support ppc32 at this stage, but that should
> mostly be a matter of writing __morestack for ppc32.
>
> The idea of split-stack is to allocate just enough stack to execute a
> function, with checks added before function entry and on alloca to
> ensure the stack is large enough.  It stack size is insufficient, a
> new stack segment is allocated for the function.  The new stack and
> old stack are not necessarily contiguous.  For powerpc64, function
> arguments on the old stack are accessed by using an arg_pointer
> register rather than accessing them relative to the stack pointer or
> frame pointer as is usually done.  (x86 copies function arguments from
> the old stack to the new, but needs an arg pointer for variable
> argument lists.)  Unwinding is handled by a personality routine that
> knows how to find stack segments.
>
> Split-stack prologue on function entry (local entry point for ELFv2)
> is as follows.  This goes before the usual function prologue.
>
> entry:
>         ld %r0,-0x7000-64(%r13)  # tcbhead_t.__private_ss
>         addis %r12,%r1,-allocate@ha
>         addi %r12,%r12,-allocate@l
>         cmpld %cr7,%r12,%r0
>         bge+ %cr7,enough
>         mflr %r0
>         std %r0,16(%r1)
>         bl __morestack
>         ld %r0,16(%r1)
>         mtlr %r0
>         blr
> enough:
> # usual function prologue, modified a little at the end to set up the
> # arg_pointer in %r12, starts here.  The arg_pointer is initialized,
> # if it is used, with
>         addi %r12,%r1,frame_size
>         bge %cr7,.+8
>         mr %r12,%r29
>
> Notes:
> 1) A function that does not allocate a stack frame, does not have a
> split-stack prologue.
>
> 2) __morestack must be local.  __morestack has a non-standard calling
> convention, with the desired stack being passed in %r12.  It saves arg
> passing regs, calls __generic_morestack to allocate a new stack
> segment, restores the arg passing regs and sets r29 to point at the
> old stack, then calls its return address + 12 to execute the function.
> After the function returns __morestack saves return regs, calls
> __generic_releasestack, and returns to the split-stack prologue, which
> immediately returns.  This scheme keeps hardware return prediction
> valid.  __morestack must also ensure cr7 is correctly set.
>
> 3) Basic-block reordering (enabled with -O2) will move the six
> instructions after the "bge+" out of line.
>
> 4) When the stack allocation is less than 32k these two instructions
>         addis %r12,%r1,-allocate@ha
>         addi %r12,%r12,-allocate@l
> are rewritten as
>         addi %r12,%r1,-allocate
>         nop
> The addi may also be rewritten as a nop in the rare case that the
> stack allocation is exactly a multiple of 64k.
>
> 5) When the linker detects a call from split-stack to non-split-stack
> code, it adds 16k (or more) to the value found in "allocate"
> instructions.  So non-split-stack code gets a larger stack.  The
> amount is tunable by a linker option.  The edit means powerpc64 does
> not need to implement __morestack_non_split, necessary on x86 because
> insufficient space is available there to edit the stack comparison
> code.  This feature is only implemented in the GNU gold linker.
>
> 6) We won't handle >2G stack initially and perhaps never.  Supporting
> multiple threads each requiring more than 2G of stack is probably not
> that important, and likely to OOM at run time.  (It would be possible
> to easily handle up to 4G by rounding the allocation up to a multiple
> of 64k and using two addis instructions in the split-stack prologue.)
>
> 7) If __morestack is called, then there are two stack frames between
> the function and its caller.  Immediately above is a small 32 byte
> frame on the new stack, there so that a back-chain is always present
> no matter the value of r1.  This could be reduced to 16 bytes but I
> thought it better to waste a few bytes for 32-byte alignment in case
> powerpc64 goes to 32-byte aligned stacks.  Above that frame is the
> __morestack frame on the old stack.
>
> 8) If the normal function prologue uses r12 as a frame pointer, as it
> always does when the frame size is larger than 32k, then the arg
> pointer is set up with
>         addi %r12,%r12,to_top_of_frame
>         bge %cr7,.+8
>         mr %r12,%r29
> omitting the addi if to_top_of_frame is zero.
>
> gcc/
>         * common/config/rs6000/rs6000-common.c (TARGET_SUPPORTS_SPLIT_STACK):
>         Define.
>         (rs6000_supports_split_stack): New function.
>         * gcc/config/rs6000/rs6000.c (machine_function): Add
>         split_stack_arg_pointer.
>         (TARGET_EXTRA_LIVE_ON_ENTRY, TARGET_INTERNAL_ARG_POINTER): Define.
>         (setup_incoming_varargs): Use crtl->args.internal_arg_pointer
>         rather than virtual_incoming_args_rtx.
>         (rs6000_va_start): Likewise.
>         (split_stack_arg_pointer_used_p): New function.
>         (rs6000_emit_prologue): Set up arg pointer for -fsplit-stack.
>         (morestack_ref): New var.
>         (gen_add3_const, rs6000_expand_split_stack_prologue,
>         rs6000_internal_arg_pointer, rs6000_live_on_entry,
>         rs6000_split_stack_space_check): New functions.
>         (rs6000_elf_file_end): Call file_end_indicate_split_stack.
>         * gcc/config/rs6000/rs6000.md (UNSPEC_STACK_CHECK): Define.
>         (UNSPECV_SPLIT_STACK_RETURN): Define.
>         (split_stack_prologue, load_split_stack_limit,
>         load_split_stack_limit_di, load_split_stack_limit_si,
>         split_stack_return, split_stack_space_check): New expands and insns.
>         * gcc/config/rs6000/rs6000-protos.h
>         (rs6000_expand_split_stack_prologue): Declare.
>         (rs6000_split_stack_space_check): Declare.
> libgcc/
>         * config/rs6000/morestack.S: New.
>         * config/rs6000/t-stack-rs6000: New.
>         * config.host (powerpc*-*-linux*): Add t-stack and t-stack-rs6000
>         to tmake_file.
>         * generic-morestack.c: Don't build for powerpc 32-bit.

This patch is okay.

I'll let you and Lynn discuss the meaning of options.

Thanks, David



More information about the Gcc-patches mailing list