This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH 3/4] split-stack for powerpc64


This patch adds -fsplit-stack support for PowerPC64 Linux.  I haven't
made any real attempt to support ppc32 at this stage, but that should
mostly be a matter of writing __morestack for ppc32.

The idea of split-stack is to allocate just enough stack to execute a
function, with checks added before function entry and on alloca to
ensure the stack is large enough.  It stack size is insufficient, a
new stack segment is allocated for the function.  The new stack and
old stack are not necessarily contiguous.  For powerpc64, function
arguments on the old stack are accessed by using an arg_pointer
register rather than accessing them relative to the stack pointer or
frame pointer as is usually done.  (x86 copies function arguments from
the old stack to the new, but needs an arg pointer for variable
argument lists.)  Unwinding is handled by a personality routine that
knows how to find stack segments.

Split-stack prologue on function entry (local entry point for ELFv2)
is as follows.  This goes before the usual function prologue.

entry:
	ld %r0,-0x7000-64(%r13)  # tcbhead_t.__private_ss
	addis %r12,%r1,-allocate@ha
	addi %r12,%r12,-allocate@l
	cmpld %cr7,%r12,%r0
	bge+ %cr7,enough
	mflr %r0
	std %r0,16(%r1)
	bl __morestack
	ld %r0,16(%r1)
	mtlr %r0
	blr
enough:
# usual function prologue, modified a little at the end to set up the
# arg_pointer in %r12, starts here.  The arg_pointer is initialized,
# if it is used, with
	addi %r12,%r1,frame_size
	bge %cr7,.+8
	mr %r12,%r29

Notes:
1) A function that does not allocate a stack frame, does not have a
split-stack prologue.

2) __morestack must be local.  __morestack has a non-standard calling
convention, with the desired stack being passed in %r12.  It saves arg
passing regs, calls __generic_morestack to allocate a new stack
segment, restores the arg passing regs and sets r29 to point at the
old stack, then calls its return address + 12 to execute the function.
After the function returns __morestack saves return regs, calls
__generic_releasestack, and returns to the split-stack prologue, which
immediately returns.  This scheme keeps hardware return prediction
valid.  __morestack must also ensure cr7 is correctly set.

3) Basic-block reordering (enabled with -O2) will move the six
instructions after the "bge+" out of line.

4) When the stack allocation is less than 32k these two instructions
	addis %r12,%r1,-allocate@ha
	addi %r12,%r12,-allocate@l
are rewritten as
	addi %r12,%r1,-allocate
	nop
The addi may also be rewritten as a nop in the rare case that the
stack allocation is exactly a multiple of 64k.

5) When the linker detects a call from split-stack to non-split-stack
code, it adds 16k (or more) to the value found in "allocate"
instructions.  So non-split-stack code gets a larger stack.  The
amount is tunable by a linker option.  The edit means powerpc64 does
not need to implement __morestack_non_split, necessary on x86 because
insufficient space is available there to edit the stack comparison 
code.  This feature is only implemented in the GNU gold linker.

6) We won't handle >2G stack initially and perhaps never.  Supporting
multiple threads each requiring more than 2G of stack is probably not
that important, and likely to OOM at run time.  (It would be possible
to easily handle up to 4G by rounding the allocation up to a multiple
of 64k and using two addis instructions in the split-stack prologue.)

7) If __morestack is called, then there are two stack frames between
the function and its caller.  Immediately above is a small 32 byte
frame on the new stack, there so that a back-chain is always present
no matter the value of r1.  This could be reduced to 16 bytes but I
thought it better to waste a few bytes for 32-byte alignment in case
powerpc64 goes to 32-byte aligned stacks.  Above that frame is the
__morestack frame on the old stack.

8) If the normal function prologue uses r12 as a frame pointer, as it
always does when the frame size is larger than 32k, then the arg
pointer is set up with
	addi %r12,%r12,to_top_of_frame
	bge %cr7,.+8
	mr %r12,%r29
omitting the addi if to_top_of_frame is zero.

gcc/
	* common/config/rs6000/rs6000-common.c (TARGET_SUPPORTS_SPLIT_STACK):
	Define.
	(rs6000_supports_split_stack): New function.
	* gcc/config/rs6000/rs6000.c (machine_function): Add
	split_stack_arg_pointer.
	(TARGET_EXTRA_LIVE_ON_ENTRY, TARGET_INTERNAL_ARG_POINTER): Define.
	(setup_incoming_varargs): Use crtl->args.internal_arg_pointer
	rather than virtual_incoming_args_rtx.
	(rs6000_va_start): Likewise.
	(split_stack_arg_pointer_used_p): New function.
	(rs6000_emit_prologue): Set up arg pointer for -fsplit-stack.
	(morestack_ref): New var.
	(gen_add3_const, rs6000_expand_split_stack_prologue,
	rs6000_internal_arg_pointer, rs6000_live_on_entry,
	rs6000_split_stack_space_check): New functions.
	(rs6000_elf_file_end): Call file_end_indicate_split_stack.
	* gcc/config/rs6000/rs6000.md (UNSPEC_STACK_CHECK): Define.
	(UNSPECV_SPLIT_STACK_RETURN): Define.
	(split_stack_prologue, load_split_stack_limit,
	load_split_stack_limit_di, load_split_stack_limit_si,
	split_stack_return, split_stack_space_check): New expands and insns.
	* gcc/config/rs6000/rs6000-protos.h
	(rs6000_expand_split_stack_prologue): Declare.
	(rs6000_split_stack_space_check): Declare.
libgcc/
	* config/rs6000/morestack.S: New.
	* config/rs6000/t-stack-rs6000: New.
	* config.host (powerpc*-*-linux*): Add t-stack and t-stack-rs6000
	to tmake_file.
	* generic-morestack.c: Don't build for powerpc 32-bit.

diff -urpN gcc-stack-info2/gcc/common/config/rs6000/rs6000-common.c gcc-split-stack1/gcc/common/config/rs6000/rs6000-common.c
--- gcc-stack-info2/gcc/common/config/rs6000/rs6000-common.c	2015-05-15 14:15:38.145244889 +0930
+++ gcc-split-stack1/gcc/common/config/rs6000/rs6000-common.c	2015-05-15 01:57:37.417258829 +0930
@@ -288,6 +288,29 @@ rs6000_handle_option (struct gcc_options
   return true;
 }
 
+/* -fsplit-stack uses a field in the TCB, available with glibc-2.18.  */
+
+static bool
+rs6000_supports_split_stack (bool report,
+			     struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+#ifndef TARGET_GLIBC_MAJOR
+#define TARGET_GLIBC_MAJOR 0
+#endif
+#ifndef TARGET_GLIBC_MINOR
+#define TARGET_GLIBC_MINOR 0
+#endif
+  /* Note: Can't test DEFAULT_ABI here, it isn't set until later.  */
+  if (TARGET_GLIBC_MAJOR * 1000 + TARGET_GLIBC_MINOR >= 2018
+      && TARGET_64BIT
+      && TARGET_ELF)
+    return true;
+
+  if (report)
+    error ("%<-fsplit-stack%> currently only supported on PowerPC64 GNU/Linux with glibc-2.18 or later");
+  return false;
+}
+
 #undef TARGET_HANDLE_OPTION
 #define TARGET_HANDLE_OPTION rs6000_handle_option
 
@@ -300,4 +323,7 @@ rs6000_handle_option (struct gcc_options
 #undef TARGET_OPTION_OPTIMIZATION_TABLE
 #define TARGET_OPTION_OPTIMIZATION_TABLE rs6000_option_optimization_table
 
+#undef TARGET_SUPPORTS_SPLIT_STACK
+#define TARGET_SUPPORTS_SPLIT_STACK rs6000_supports_split_stack
+
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff -urpN gcc-stack-info2/gcc/config/rs6000/rs6000.c gcc-split-stack1/gcc/config/rs6000/rs6000.c
--- gcc-stack-info2/gcc/config/rs6000/rs6000.c	2015-05-16 13:33:37.170406399 +0930
+++ gcc-split-stack1/gcc/config/rs6000/rs6000.c	2015-05-16 14:54:55.483454632 +0930
@@ -187,6 +187,8 @@ typedef struct GTY(()) machine_function
      64-bits wide and is allocated early enough so that the offset
      does not overflow the 16-bit load/store offset field.  */
   rtx sdmode_stack_slot;
+  /* Alternative internal arg pointer for -fsplit-stack.  */
+  rtx split_stack_arg_pointer;
   /* Flag if r2 setup is needed with ELFv2 ABI.  */
   bool r2_setup_needed;
 } machine_function;
@@ -1190,6 +1192,7 @@ static bool rs6000_debug_cannot_change_m
 						   machine_mode,
 						   enum reg_class);
 static bool rs6000_save_toc_in_prologue_p (void);
+static rtx rs6000_internal_arg_pointer (void);
 
 rtx (*rs6000_legitimize_reload_address_ptr) (rtx, machine_mode, int, int,
 					     int, int *)
@@ -1411,6 +1414,12 @@ static const struct attribute_spec rs600
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE rs6000_set_up_by_prologue
 
+#undef TARGET_EXTRA_LIVE_ON_ENTRY
+#define TARGET_EXTRA_LIVE_ON_ENTRY rs6000_live_on_entry
+
+#undef TARGET_INTERNAL_ARG_POINTER
+#define TARGET_INTERNAL_ARG_POINTER rs6000_internal_arg_pointer
+
 #undef TARGET_HAVE_TLS
 #define TARGET_HAVE_TLS HAVE_AS_TLS
 
@@ -11150,7 +11159,7 @@ setup_incoming_varargs (cumulative_args_
   else
     {
       first_reg_offset = next_cum.words;
-      save_area = virtual_incoming_args_rtx;
+      save_area = crtl->args.internal_arg_pointer;
 
       if (targetm.calls.must_pass_in_stack (mode, type))
 	first_reg_offset += rs6000_arg_size (TYPE_MODE (type), type);
@@ -11344,7 +11353,7 @@ rs6000_va_start (tree valist, rtx nextar
     }
 
   /* Find the overflow area.  */
-  t = make_tree (TREE_TYPE (ovf), virtual_incoming_args_rtx);
+  t = make_tree (TREE_TYPE (ovf), crtl->args.internal_arg_pointer);
   if (words != 0)
     t = fold_build_pointer_plus_hwi (t, words * MIN_UNITS_PER_WORD);
   t = build2 (MODIFY_EXPR, TREE_TYPE (ovf), ovf, t);
@@ -23425,6 +23434,48 @@ rs6000_reg_live_or_pic_offset_p (int reg
                   || (DEFAULT_ABI == ABI_DARWIN && flag_pic))));
 }
 
+/* Return whether the split-stack arg pointer (r12) is used.  */
+
+static bool
+split_stack_arg_pointer_used_p (void)
+{
+  /* If the pseudo holding the arg pointer is no longer a pseudo,
+     then the arg pointer is used.  */
+  if (cfun->machine->split_stack_arg_pointer != NULL_RTX
+      && (!REG_P (cfun->machine->split_stack_arg_pointer)
+	  || (REGNO (cfun->machine->split_stack_arg_pointer)
+	      < FIRST_PSEUDO_REGISTER)))
+    return true;
+
+  /* Unfortunately we also need to do some code scanning, since
+     r12 may have been substituted for the pseudo.  */
+  rtx_insn *insn;
+  basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+  FOR_BB_INSNS (bb, insn)
+    if (NONDEBUG_INSN_P (insn))
+      {
+	/* A call destroys r12.  */
+	if (CALL_P (insn))
+	  return false;
+
+	df_ref use;
+	FOR_EACH_INSN_USE (use, insn)
+	  {
+	    rtx x = DF_REF_REG (use);
+	    if (REG_P (x) && REGNO (x) == 12)
+	      return true;
+	  }
+	df_ref def;
+	FOR_EACH_INSN_DEF (def, insn)
+	  {
+	    rtx x = DF_REF_REG (def);
+	    if (REG_P (x) && REGNO (x) == 12)
+	      return false;
+	  }
+      }
+  return bitmap_bit_p (DF_LR_OUT (bb), 12);
+}
+
 /* Emit function prologue as insns.  */
 
 void
@@ -24376,6 +24427,40 @@ rs6000_emit_prologue (void)
       rtx reg = gen_rtx_REG (reg_mode, TOC_REGNUM);
       emit_insn (gen_frame_store (reg, sp_reg_rtx, RS6000_TOC_SAVE_SLOT));
     }
+
+  if (flag_split_stack && split_stack_arg_pointer_used_p ())
+    {
+      /* Set up the arg pointer (r12) for -fsplit-stack code.  If
+	 __morestack was called, it left the arg pointer to the old
+	 stack in r29.  Otherwise, the arg pointer is the top of the
+	 current frame.  */
+      if (frame_off != 0 || REGNO (frame_reg_rtx) != 12)
+	{
+	  rtx r12 = gen_rtx_REG (Pmode, 12);
+	  if (frame_off == 0)
+	    emit_move_insn (r12, frame_reg_rtx);
+	  else
+	    emit_insn (gen_add3_insn (r12, frame_reg_rtx, GEN_INT (frame_off)));
+	}
+      if (info->push_p)
+	{
+	  rtx r12 = gen_rtx_REG (Pmode, 12);
+	  rtx r29 = gen_rtx_REG (Pmode, 29);
+	  rtx cr7 = gen_rtx_REG (CCUNSmode, CR7_REGNO);
+	  rtx not_more = gen_label_rtx ();
+	  rtx jump;
+
+	  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
+				       gen_rtx_GEU (VOIDmode, cr7, const0_rtx),
+				       gen_rtx_LABEL_REF (VOIDmode, not_more),
+				       pc_rtx);
+	  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
+	  JUMP_LABEL (jump) = not_more;
+	  LABEL_NUSES (not_more) += 1;
+	  emit_move_insn (r12, r29);
+	  emit_label (not_more);
+	}
+    }
 }
 
 /* Output .extern statements for the save/restore routines we use.  */
@@ -25803,6 +25888,178 @@ rs6000_output_function_epilogue (FILE *f
       fputs ("\t.align 2\n", file);
     }
 }
+
+/* -fsplit-stack support.  */
+
+/* A SYMBOL_REF for __morestack.  */
+static GTY(()) rtx morestack_ref;
+
+static rtx
+gen_add3_const (rtx rt, rtx ra, long c)
+{
+  if (TARGET_64BIT)
+    return gen_adddi3 (rt, ra, GEN_INT (c));
+ else
+    return gen_addsi3 (rt, ra, GEN_INT (c));
+}
+
+/* Emit -fsplit-stack prologue, which goes before the regular function
+   prologue (at local entry point in the case of ELFv2).  */
+
+void
+rs6000_expand_split_stack_prologue (void)
+{
+  rs6000_stack_t *info = rs6000_stack_info ();
+  unsigned HOST_WIDE_INT allocate;
+  long alloc_hi, alloc_lo;
+  rtx r0, r1, r12, lr, ok_label, compare, jump, call_fusage;
+  rtx_insn *insn;
+
+  gcc_assert (flag_split_stack && reload_completed);
+
+  if (!info->push_p)
+    return;
+
+  allocate = info->total_size;
+  if (allocate > (unsigned HOST_WIDE_INT) 1 << 31)
+    {
+      sorry ("Stack frame larger than 2G is not supported for -fsplit-stack");
+      return;
+    }
+  if (morestack_ref == NULL_RTX)
+    {
+      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
+      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
+					   | SYMBOL_FLAG_FUNCTION);
+    }
+
+  r0 = gen_rtx_REG (Pmode, 0);
+  r1 = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
+  r12 = gen_rtx_REG (Pmode, 12);
+  emit_insn (gen_load_split_stack_limit (r0));
+  /* Always emit two insns here to calculate the requested stack,
+     so that the linker can edit them when adjusting size for calling
+     non-split-stack code.  */
+  alloc_hi = (-allocate + 0x8000) & ~0xffffL;
+  alloc_lo = -allocate - alloc_hi;
+  if (alloc_hi != 0)
+    {
+      emit_insn (gen_add3_const (r12, r1, alloc_hi));
+      if (alloc_lo != 0)
+	emit_insn (gen_add3_const (r12, r12, alloc_lo));
+      else
+	emit_insn (gen_nop ());
+    }
+  else
+    {
+      emit_insn (gen_add3_const (r12, r1, alloc_lo));
+      emit_insn (gen_nop ());
+    }
+
+  compare = gen_rtx_REG (CCUNSmode, CR7_REGNO);
+  emit_insn (gen_rtx_SET (compare, gen_rtx_COMPARE (CCUNSmode, r12, r0)));
+  ok_label = gen_label_rtx ();
+  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
+			       gen_rtx_GEU (VOIDmode, compare, const0_rtx),
+			       gen_rtx_LABEL_REF (VOIDmode, ok_label),
+			       pc_rtx);
+  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
+  JUMP_LABEL (jump) = ok_label;
+  /* Mark the jump as very likely to be taken.  */
+  add_int_reg_note (jump, REG_BR_PROB,
+		    REG_BR_PROB_BASE - REG_BR_PROB_BASE / 100);
+
+  lr = gen_rtx_REG (Pmode, LR_REGNO);
+  insn = emit_move_insn (r0, lr);
+  RTX_FRAME_RELATED_P (insn) = 1;
+  insn = emit_insn (gen_frame_store (r0, r1, info->lr_save_offset));
+  RTX_FRAME_RELATED_P (insn) = 1;
+
+  insn = emit_call_insn (gen_call (gen_rtx_MEM (SImode, morestack_ref),
+				   const0_rtx, const0_rtx));
+  call_fusage = NULL_RTX;
+  use_reg (&call_fusage, r12);
+  add_function_usage_to (insn, call_fusage);
+  emit_insn (gen_frame_load (r0, r1, info->lr_save_offset));
+  insn = emit_move_insn (lr, r0);
+  add_reg_note (insn, REG_CFA_RESTORE, lr);
+  RTX_FRAME_RELATED_P (insn) = 1;
+  emit_insn (gen_split_stack_return ());
+
+  emit_label (ok_label);
+  LABEL_NUSES (ok_label) = 1;
+}
+
+/* Return the internal arg pointer used for function incoming
+   arguments.  When -fsplit-stack, the arg pointer is r12 so we need
+   to copy it to a pseudo in order for it to be preserved over calls
+   and suchlike.  We'd really like to use a pseudo here for the
+   internal arg pointer but data-flow analysis is not prepared to
+   accept pseudos as live at the beginning of a function.  */
+
+static rtx
+rs6000_internal_arg_pointer (void)
+{
+  if (flag_split_stack)
+    {
+      if (cfun->machine->split_stack_arg_pointer == NULL_RTX)
+	{
+	  rtx pat;
+
+	  cfun->machine->split_stack_arg_pointer = gen_reg_rtx (Pmode);
+	  REG_POINTER (cfun->machine->split_stack_arg_pointer) = 1;
+
+	  /* Put the pseudo initialization right after the note at the
+	     beginning of the function.  */
+	  pat = gen_rtx_SET (cfun->machine->split_stack_arg_pointer,
+			     gen_rtx_REG (Pmode, 12));
+	  push_topmost_sequence ();
+	  emit_insn_after (pat, get_insns ());
+	  pop_topmost_sequence ();
+	}
+      return plus_constant (Pmode, cfun->machine->split_stack_arg_pointer,
+			    FIRST_PARM_OFFSET (current_function_decl));
+    }
+  return virtual_incoming_args_rtx;
+}
+
+/* We may have to tell the dataflow pass that the split stack prologue
+   is initializing a register.  */
+
+static void
+rs6000_live_on_entry (bitmap regs)
+{
+  if (flag_split_stack)
+    bitmap_set_bit (regs, 12);
+}
+
+/* Emit -fsplit-stack dynamic stack allocation space check.  */
+
+void
+rs6000_split_stack_space_check (rtx size, rtx label)
+{
+  rtx sp = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
+  rtx limit = gen_reg_rtx (Pmode);
+  rtx requested = gen_reg_rtx (Pmode);
+  rtx cmp = gen_reg_rtx (CCUNSmode);
+  rtx jump;
+
+  emit_insn (gen_load_split_stack_limit (limit));
+  if (CONST_INT_P (size))
+    emit_insn (gen_add3_insn (requested, sp, GEN_INT (-INTVAL (size))));
+  else
+    {
+      size = force_reg (Pmode, size);
+      emit_move_insn (requested, gen_rtx_MINUS (Pmode, sp, size));
+    }
+  emit_insn (gen_rtx_SET (cmp, gen_rtx_COMPARE (CCUNSmode, requested, limit)));
+  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
+			       gen_rtx_GEU (VOIDmode, cmp, const0_rtx),
+			       gen_rtx_LABEL_REF (VOIDmode, label),
+			       pc_rtx);
+  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
+  JUMP_LABEL (jump) = label;
+}
 
 /* A C compound statement that outputs the assembler code for a thunk
    function, used to implement C++ virtual function calls with
@@ -29811,6 +30068,9 @@ rs6000_elf_file_end (void)
   if (TARGET_32BIT || DEFAULT_ABI == ABI_ELFv2)
     file_end_indicate_exec_stack ();
 #endif
+
+  if (flag_split_stack)
+    file_end_indicate_split_stack ();
 }
 #endif
 
diff -urpN gcc-stack-info2/gcc/config/rs6000/rs6000.md gcc-split-stack1/gcc/config/rs6000/rs6000.md
--- gcc-stack-info2/gcc/config/rs6000/rs6000.md	2015-05-15 14:15:38.177243589 +0930
+++ gcc-split-stack1/gcc/config/rs6000/rs6000.md	2015-05-15 02:01:15.776472615 +0930
@@ -140,6 +140,7 @@
    UNSPEC_PACK_128BIT
    UNSPEC_LSQ
    UNSPEC_FUSION_GPR
+   UNSPEC_STACK_CHECK
   ])
 
 ;;
@@ -157,6 +158,7 @@
    UNSPECV_NLGR			; non-local goto receiver
    UNSPECV_MFFS			; Move from FPSCR
    UNSPECV_MTFSF		; Move to FPSCR Fields
+   UNSPECV_SPLIT_STACK_RETURN   ; A camouflaged return
   ])
 
 
@@ -12345,6 +12347,72 @@
 }"
   [(set_attr "type" "load")])
 
+;; Handle -fsplit-stack.
+
+(define_expand "split_stack_prologue"
+  [(const_int 0)]
+  ""
+{
+  rs6000_expand_split_stack_prologue ();
+  DONE;
+})
+
+(define_expand "load_split_stack_limit"
+  [(set (match_operand 0)
+	(unspec [(const_int 0)] UNSPEC_STACK_CHECK))]
+  ""
+{
+  emit_insn (gen_rtx_SET (operands[0],
+			  gen_rtx_UNSPEC (Pmode,
+					  gen_rtvec (1, const0_rtx),
+					  UNSPEC_STACK_CHECK)));
+  DONE;
+})
+
+(define_insn "load_split_stack_limit_di"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+	(unspec:DI [(const_int 0)] UNSPEC_STACK_CHECK))]
+  "TARGET_64BIT"
+  "ld %0,-0x7040(13)"
+  [(set_attr "type" "load")
+   (set_attr "update" "no")
+   (set_attr "indexed" "no")])
+
+(define_insn "load_split_stack_limit_si"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
+	(unspec:SI [(const_int 0)] UNSPEC_STACK_CHECK))]
+  "!TARGET_64BIT"
+  "lwz %0,-0x7020(2)"
+  [(set_attr "type" "load")
+   (set_attr "update" "no")
+   (set_attr "indexed" "no")])
+
+;; A return instruction which the middle-end doesn't see.
+(define_insn "split_stack_return"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_RETURN)]
+  ""
+  "blr"
+  [(set_attr "type" "jmpreg")])
+
+;; If there are operand 0 bytes available on the stack, jump to
+;; operand 1.
+(define_expand "split_stack_space_check"
+  [(set (match_dup 2)
+	(unspec [(const_int 0)] UNSPEC_STACK_CHECK))
+   (set (match_dup 3)
+	(minus (reg STACK_POINTER_REGNUM)
+	       (match_operand 0)))
+   (set (match_dup 4) (compare:CCUNS (match_dup 3) (match_dup 2)))
+   (set (pc) (if_then_else
+	      (geu (match_dup 4) (const_int 0))
+	      (label_ref (match_operand 1))
+	      (pc)))]
+  ""
+{
+  rs6000_split_stack_space_check (operands[0], operands[1]);
+  DONE;
+})
+
 (define_insn "bpermd_<mode>"
   [(set (match_operand:P 0 "gpc_reg_operand" "=r")
 	(unspec:P [(match_operand:P 1 "gpc_reg_operand" "r")
diff -urpN gcc-stack-info2/gcc/config/rs6000/rs6000-protos.h gcc-split-stack1/gcc/config/rs6000/rs6000-protos.h
--- gcc-stack-info2/gcc/config/rs6000/rs6000-protos.h	2015-05-15 14:15:38.149244726 +0930
+++ gcc-split-stack1/gcc/config/rs6000/rs6000-protos.h	2015-05-15 01:57:37.417258829 +0930
@@ -191,6 +191,8 @@ extern void rs6000_emit_prologue (void);
 extern void rs6000_emit_load_toc_table (int);
 extern unsigned int rs6000_dbx_register_number (unsigned int, unsigned int);
 extern void rs6000_emit_epilogue (int);
+extern void rs6000_expand_split_stack_prologue (void);
+extern void rs6000_split_stack_space_check (rtx, rtx);
 extern void rs6000_emit_eh_reg_restore (rtx, rtx);
 extern const char * output_isel (rtx *);
 extern void rs6000_call_aix (rtx, rtx, rtx, rtx);
diff -urpN gcc-stack-info2/libgcc/config/rs6000/morestack.S gcc-split-stack1/libgcc/config/rs6000/morestack.S
--- gcc-stack-info2/libgcc/config/rs6000/morestack.S	1970-01-01 09:30:00.000000000 +0930
+++ gcc-split-stack1/libgcc/config/rs6000/morestack.S	2015-05-15 14:54:02.247603731 +0930
@@ -0,0 +1,351 @@
+#ifdef __powerpc64__
+# PowerPC64 support for -fsplit-stack.
+# Copyright (C) 2009-2015 Free Software Foundation, Inc.
+# Contributed by Alan Modra <amodra@gmail.com>.
+
+# This file is part of GCC.
+
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+#if _CALL_ELF == 2
+	.abiversion 2
+#define PARAMS 32
+#else
+	.abiversion 1
+#define PARAMS 48
+#endif
+#define MORESTACK_FRAMESIZE	(PARAMS+96)
+#define PARAMREG_SAVE		-MORESTACK_FRAMESIZE+PARAMS+0
+#define STATIC_CHAIN_SAVE	-MORESTACK_FRAMESIZE+PARAMS+64
+#define R29_SAVE		-MORESTACK_FRAMESIZE+PARAMS+72
+#define LINKREG_SAVE		-MORESTACK_FRAMESIZE+PARAMS+80
+#define NEWSTACKSIZE_SAVE	-MORESTACK_FRAMESIZE+PARAMS+88
+
+# Excess space needed to call ld.so resolver for lazy plt
+# resolution.  Go uses sigaltstack so this doesn't need to
+# also cover signal frame size.
+#define BACKOFF 4096
+# Large excess allocated when calling non-split-stack code.
+#define NON_SPLIT_STACK 0x100000
+
+
+#if _CALL_ELF == 2
+
+#define BODY_LABEL(name) name
+
+#define ENTRY0(name)					\
+	.global name;					\
+	.hidden	name;					\
+	.type name,@function;				\
+name##:
+
+#define ENTRY(name)					\
+	ENTRY0(name);					\
+0:	addis %r2,%r12,.TOC.-0b@ha;			\
+        addi %r2,%r2,.TOC.-0b@l;			\
+	.localentry name, .-name
+
+#else
+
+#define BODY_LABEL(name) .L.##name
+
+#define ENTRY0(name)					\
+	.global name;					\
+	.hidden	name;					\
+	.type name,@function;				\
+	.pushsection ".opd","aw";			\
+	.p2align 3;					\
+name##: .quad BODY_LABEL (name), .TOC.@tocbase, 0;	\
+	.popsection;					\
+BODY_LABEL(name)##:
+
+#define ENTRY(name) ENTRY0(name)
+
+#endif
+
+#define SIZE(name) .size name, .-BODY_LABEL(name)
+
+
+	.text
+# Just like __morestack, but with larger excess allocation
+ENTRY0(__morestack_non_split)
+.LFB1:
+	.cfi_startproc
+# We use a cleanup to restore the tcbhead_t.__private_ss if
+# an exception is thrown through this code.
+#ifdef __PIC__
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#else
+	.cfi_personality 0x3,__gcc_personality_v0
+	.cfi_lsda 0x3,.LLSDA1
+#endif
+# LR is already saved by the split-stack prologue code.
+# We may as well have the unwinder skip over the call in the
+# prologue too.
+	.cfi_offset %lr,16
+
+	addis %r12,%r12,-NON_SPLIT_STACK@h
+	SIZE (__morestack_non_split)
+# Fall through into __morestack
+
+
+# This function is called with non-standard calling conventions.
+# On entry, r12 is the requested stack pointer.  One version of the
+# split-stack prologue that calls __morestack looks like
+#	ld %r0,-0x7000-64(%r13)
+#	addis %r12,%r1,-allocate@ha
+#	addi %r12,%r12,-allocate@l
+#	cmpld %r12,%r0
+#	bge+ enough
+#	mflr %r0
+#	std %r0,16(%r1)
+#	bl __morestack
+#	ld %r0,16(%r1)
+#	mtlr %r0
+#	blr
+# enough:
+# The normal function prologue follows here, with a small addition at
+# the end to set up the arg pointer.  The arg pointer is set up with:
+#	addi %r12,%r1,offset
+#	bge %cr7,.+8
+#	mr %r12,%r29
+#
+# Note that the lr save slot 16(%r1) has already been used.
+# r3 thru r11 possibly contain arguments and a static chain
+# pointer for the function we're calling, so must be preserved.
+# cr7 must also be preserved.
+
+ENTRY0(__morestack)
+# Save parameter passing registers, our arguments, lr, r29
+# and use r29 as a frame pointer.
+	std %r3,PARAMREG_SAVE+0(%r1)
+	sub %r3,%r1,%r12		# calculate requested stack size
+	mflr %r12
+	std %r4,PARAMREG_SAVE+8(%r1)
+	std %r5,PARAMREG_SAVE+16(%r1)
+	std %r6,PARAMREG_SAVE+24(%r1)
+	std %r7,PARAMREG_SAVE+32(%r1)
+	addi %r3,%r3,BACKOFF
+	std %r8,PARAMREG_SAVE+40(%r1)
+	std %r9,PARAMREG_SAVE+48(%r1)
+	std %r10,PARAMREG_SAVE+56(%r1)
+	std %r11,STATIC_CHAIN_SAVE(%r1)
+	std %r29,R29_SAVE(%r1)
+	std %r12,LINKREG_SAVE(%r1)
+	std %r3,NEWSTACKSIZE_SAVE(%r1)	# new stack size
+	mr %r29,%r1
+	.cfi_offset %r29,R29_SAVE
+	.cfi_def_cfa_register %r29
+	stdu %r1,-MORESTACK_FRAMESIZE(%r1)
+
+	# void __morestack_block_signals (void)
+	bl __morestack_block_signals
+
+	# void *__generic_morestack (size_t *pframe_size,
+	#			     void *old_stack,
+	#			     size_t param_size)
+	addi %r3,%r29,NEWSTACKSIZE_SAVE
+	mr %r4,%r29
+	li %r5,0			# no copying from old stack
+	bl __generic_morestack
+
+# Start using new stack
+	stdu %r29,-32(%r3)		# back-chain
+	mr %r1,%r3
+
+# Set __private_ss stack guard for the new stack.
+	ld %r12,NEWSTACKSIZE_SAVE(%r29)	# modified size
+	addi %r3,%r3,BACKOFF-32
+	sub %r3,%r3,%r12
+# Note that a signal frame has $pc pointing at the instruction
+# where the signal occurred.  For something like a timer
+# interrupt this means the instruction has already executed,
+# thus the region starts at the instruction modifying
+# __private_ss, not one instruction after.
+.LEHB0:
+	std %r3,-0x7000-64(%r13)	# tcbhead_t.__private_ss
+
+	# void __morestack_unblock_signals (void)
+	bl __morestack_unblock_signals
+
+# Set up for a call to the target function, located 3
+# instructions after __morestack's return address.
+#
+	ld %r12,LINKREG_SAVE(%r29)
+	ld %r3,PARAMREG_SAVE+0(%r29)	# restore arg regs
+	ld %r4,PARAMREG_SAVE+8(%r29)
+	ld %r5,PARAMREG_SAVE+16(%r29)
+	ld %r6,PARAMREG_SAVE+24(%r29)
+	ld %r7,PARAMREG_SAVE+32(%r29)
+	ld %r8,PARAMREG_SAVE+40(%r29)
+	ld %r9,PARAMREG_SAVE+48(%r29)
+	addi %r0,%r12,12		# add 3 instructions
+	ld %r10,PARAMREG_SAVE+56(%r29)
+	ld %r11,STATIC_CHAIN_SAVE(%r29)
+	cmpld %cr7,%r12,%r0		# indicate we were called
+	mtctr %r0
+	bctrl				# call caller!
+
+# On return, save regs possibly used to return a value, and
+# possibly trashed by calls to __morestack_block_signals,
+# __generic_releasestack and __morestack_unblock_signals.
+# Assume those calls don't use vector or floating point regs.
+	std %r3,PARAMREG_SAVE+0(%r29)
+	std %r4,PARAMREG_SAVE+8(%r29)
+	std %r5,PARAMREG_SAVE+16(%r29)
+	std %r6,PARAMREG_SAVE+24(%r29)
+#if _CALL_ELF == 2
+	std %r7,PARAMREG_SAVE+32(%r29)
+	std %r8,PARAMREG_SAVE+40(%r29)
+	std %r9,PARAMREG_SAVE+48(%r29)
+	std %r10,PARAMREG_SAVE+56(%r29)
+#endif
+
+	bl __morestack_block_signals
+
+	# void *__generic_releasestack (size_t *pavailable)
+	addi %r3,%r29,NEWSTACKSIZE_SAVE
+	bl __generic_releasestack
+
+# Reset __private_ss stack guard to value for old stack
+	ld %r12,NEWSTACKSIZE_SAVE(%r29)
+	addi %r3,%r3,BACKOFF
+	sub %r3,%r3,%r12
+.LEHE0:
+	std %r3,-0x7000-64(%r13)	# tcbhead_t.__private_ss
+
+	bl __morestack_unblock_signals
+
+# Use old stack again.
+	mr %r1,%r29
+
+# Restore return value regs, and return.
+	ld %r0,LINKREG_SAVE(%r29)
+	mtlr %r0
+	ld %r3,PARAMREG_SAVE+0(%r29)
+	ld %r4,PARAMREG_SAVE+8(%r29)
+	ld %r5,PARAMREG_SAVE+16(%r29)
+	ld %r6,PARAMREG_SAVE+24(%r29)
+#if _CALL_ELF == 2
+	ld %r7,PARAMREG_SAVE+32(%r29)
+	ld %r8,PARAMREG_SAVE+40(%r29)
+	ld %r9,PARAMREG_SAVE+48(%r29)
+	ld %r10,PARAMREG_SAVE+56(%r29)
+#endif
+	ld %r29,R29_SAVE(%r29)
+	.cfi_def_cfa_register %r1
+	blr
+
+# This is the cleanup code called by the stack unwinder when
+# unwinding through code between .LEHB0 and .LEHE0 above.
+cleanup:
+	.cfi_def_cfa_register %r29
+	std %r3,PARAMREG_SAVE(%r29)	# Save exception header
+	# size_t __generic_findstack (void *stack)
+	mr %r3,%r29
+	bl __generic_findstack
+	sub %r3,%r29,%r3
+	addi %r3,%r3,BACKOFF
+	std %r3,-0x7000-64(%r13)	# tcbhead_t.__private_ss
+	ld %r3,PARAMREG_SAVE(%r29)
+	bl _Unwind_Resume
+	nop
+	.cfi_endproc
+	SIZE (__morestack)
+
+
+	.section .gcc_except_table,"a",@progbits
+	.p2align 2
+.LLSDA1:
+	.byte	0xff	# @LPStart format (omit)
+	.byte	0xff	# @TType format (omit)
+	.byte	0x1	# call-site format (uleb128)
+	.uleb128 .LLSDACSE1-.LLSDACSB1	# Call-site table length
+.LLSDACSB1:
+	.uleb128 .LEHB0-.LFB1	# region 0 start
+	.uleb128 .LEHE0-.LEHB0	# length
+	.uleb128 cleanup-.LFB1	# landing pad
+	.uleb128 0		# no action, ie. a cleanup
+.LLSDACSE1:
+
+
+#ifdef __PIC__
+# Build a position independent reference to the personality function.
+	.hidden DW.ref.__gcc_personality_v0
+	.weak DW.ref.__gcc_personality_v0
+	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
+	.p2align 3
+DW.ref.__gcc_personality_v0:
+	.quad __gcc_personality_v0
+	.type DW.ref.__gcc_personality_v0, @object
+	.size DW.ref.__gcc_personality_v0, 8
+#endif
+
+
+	.text
+# Initialize the stack guard when the program starts or when a
+# new thread starts.  This is called from a constructor.
+# void __stack_split_initialize (void)
+ENTRY(__stack_split_initialize)
+	addi %r3,%r1,-0x4000		# We should have at least 16K.
+	std %r3,-0x7000-64(%r13)	# tcbhead_t.__private_ss
+	# void __generic_morestack_set_initial_sp (void *sp, size_t len)
+	mr %r3,%r1
+	li %r4, 0x4000
+	b __generic_morestack_set_initial_sp
+	SIZE (__stack_split_initialize)
+
+
+# Return current __private_ss
+# void *__morestack_get_guard (void)
+ENTRY0(__morestack_get_guard)
+	ld %r3,-0x7000-64(%r13)		# tcbhead_t.__private_ss
+	blr
+	SIZE (__morestack_get_guard)
+
+
+# Set __private_ss
+# void __morestack_set_guard (void *ptr)
+ENTRY0(__morestack_set_guard)
+	std %r3,-0x7000-64(%r13)	# tcbhead_t.__private_ss
+	blr
+	SIZE (__morestack_set_guard)
+
+
+# Return the stack guard value for given stack
+# void *__morestack_make_guard (void *stack, size_t size)
+ENTRY0(__morestack_make_guard)
+	sub %r3,%r3,%r4
+	addi %r3,%r3,BACKOFF
+	blr
+	SIZE (__morestack_make_guard)
+
+
+# Make __stack_split_initialize a high priority constructor.
+	.section .ctors.65535,"aw",@progbits
+	.p2align 3
+	.quad __stack_split_initialize
+	.quad __morestack_load_mmap
+
+	.section .note.GNU-stack,"",@progbits
+	.section .note.GNU-split-stack,"",@progbits
+	.section .note.GNU-no-split-stack,"",@progbits
+#endif /* __powerpc64__ */
diff -urpN gcc-stack-info2/libgcc/config/rs6000/t-stack-rs6000 gcc-split-stack1/libgcc/config/rs6000/t-stack-rs6000
--- gcc-stack-info2/libgcc/config/rs6000/t-stack-rs6000	1970-01-01 09:30:00.000000000 +0930
+++ gcc-split-stack1/libgcc/config/rs6000/t-stack-rs6000	2015-05-15 01:57:37.429258346 +0930
@@ -0,0 +1,2 @@
+# Makefile fragment to support -fsplit-stack for powerpc.
+LIB2ADD_ST += $(srcdir)/config/rs6000/morestack.S
diff -urpN gcc-stack-info2/libgcc/config.host gcc-split-stack1/libgcc/config.host
--- gcc-stack-info2/libgcc/config.host	2015-05-15 14:15:38.193242938 +0930
+++ gcc-split-stack1/libgcc/config.host	2015-05-15 01:57:37.429258346 +0930
@@ -1021,6 +1021,7 @@ powerpc-*-rtems*)
 	;;
 powerpc*-*-linux*)
 	tmake_file="${tmake_file} rs6000/t-ppccomm rs6000/t-savresfgpr rs6000/t-crtstuff rs6000/t-linux t-dfprules rs6000/t-ppc64-fp t-slibgcc-libgcc"
+	tmake_file="${tmake_file} t-stack rs6000/t-stack-rs6000"
 	case $ppc_fp_type in
 	64)
 		;;
diff -urpN gcc-stack-info2/libgcc/generic-morestack.c gcc-split-stack1/libgcc/generic-morestack.c
--- gcc-stack-info2/libgcc/generic-morestack.c	2015-05-15 14:15:38.193242938 +0930
+++ gcc-split-stack1/libgcc/generic-morestack.c	2015-05-15 01:57:37.429258346 +0930
@@ -23,6 +23,9 @@ a copy of the GCC Runtime Library Except
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 <http://www.gnu.org/licenses/>.  */
 
+/* powerpc 32-bit not supported.  */
+#if !defined __powerpc__ || defined __powerpc64__
+
 #include "tconfig.h"
 #include "tsystem.h"
 #include "coretypes.h"
@@ -935,6 +938,7 @@ __splitstack_find (void *segment_arg, vo
       nsp -= 12 * sizeof (void *);
 #elif defined (__i386__)
       nsp -= 6 * sizeof (void *);
+#elif defined __powerpc64__
 #else
 #error "unrecognized target"
 #endif
@@ -1170,3 +1174,4 @@ __splitstack_find_context (void *context
 }
 
 #endif /* !defined (inhibit_libc) */
+#endif /* not powerpc 32-bit */

-- 
Alan Modra
Australia Development Lab, IBM


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]