This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] Get rid of stack trampolines for nested functions


On 06/29/2016 04:08 PM, Eric Botcazou wrote:
Hi,

this patch implements generic support for the elimination of stack trampolines
and, consequently, of the need to make the stack executable when pointers to
nested functions are used.  That's done on a per-language and per-target basis
(i.e. there is 1 language hook and 1 target hook to parameterize it) and there
are no changes whatsoever in code generation if both are not turned on (and
the patch implements a -ftrampolines option to let the user override them).
I'm sure there's security folks that would want to see this across all targets for all languages. Sadly due to the ABI implications that's not likely to happen :(


The idea is based on the fact that, for targets using function descriptors as
per their ABI like IA-64, AIX or VMS platforms, stack trampolines "degenerate"
into descriptors built at run time on the stack and thus made up of data only,
which in turn means that the stack doesn't need to be made executable.
Right.


This descriptor-based scheme is implemented generically for nested functions,
i.e. the nested function lowering pass builds generic descriptors instead of
trampolines on the stack when encountering pointers to nested functions, which
means that there are 2 kinds of pointers to functions and therefore a run-time
identification mechanism is needed for indirect calls to distinguish them.

Because of that, enabling the support breaks binary compatibility (for code
manipulating pointers to nested functions).  That's OK for Ada and nested
functions are first-class citizens in the language anyway so we really need
this, but not for C so for example Ada doesn't use it at the interface with C
(when objects have "convention C" in Ada parlance).
So just to be clear, once installed, for Ada we'll start using the descriptor based scheme on the subset of targets below, which will be an ABI change, right?

If so and this goes in, we'll have to make sure to signal loud and wide about the ABI change at the project level and the various distributors will need to do the same for their consumers as well.



This was bootstrapped/regtested on x86_64-suse-linux but AdaCore has been
using it on native platforms (Linux, Windows, Solaris, etc) for years.

OK for the mainline?


2016-06-29  Eric Botcazou  <ebotcazou@adacore.com>

	PR ada/37139
	PR ada/67205
	* common.opt (-ftrampolines): New option.
	* doc/invoke.texi (Code Gen Options): Document it.
	* doc/tm.texi.in (Trampolines): Add TARGET_CUSTOM_FUNCTION_DESCRIPTORS
	* doc/tm.texi: Regenerate.
	* builtins.def: Add init_descriptor and adjust_descriptor.
	* builtins.c (expand_builtin_init_trampoline): Do not issue a warning
	on platforms with descriptors.
	(expand_builtin_init_descriptor): New function.
	(expand_builtin_adjust_descriptor): Likewise.
	(expand_builtin) <BUILT_IN_INIT_DESCRIPTOR>: New case.
	<BUILT_IN_ADJUST_DESCRIPTOR>: Likewise.
	* calls.c (prepare_call_address): Remove SIBCALLP parameter and add
	FLAGS parameter.  Deal with indirect calls by descriptor and adjust.
	Set STATIC_CHAIN_REG_P on the static chain register, if any.
	(call_expr_flags): Set ECF_BY_DESCRIPTOR for calls by descriptor.
	(expand_call): Likewise.  Move around call to prepare_call_address
	and pass all flags to it.
	* cfgexpand.c (expand_call_stmt): Reinstate CALL_EXPR_BY_DESCRIPTOR.
	* gimple.h (enum gf_mask): New GF_CALL_BY_DESCRIPTOR value.
	(gimple_call_set_by_descriptor): New setter.
	(gimple_call_by_descriptor_p): New getter.
	* gimple.c (gimple_build_call_from_tree): Set CALL_EXPR_BY_DESCRIPTOR.
	(gimple_call_flags): Deal with GF_CALL_BY_DESCRIPTOR.
	* langhooks.h (struct lang_hooks): Add custom_function_descriptors.
	* langhooks-def.h (LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS): Define.
	(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS.
	* rtl.h (STATIC_CHAIN_REG_P): New macro.
	* rtlanal.c (find_first_parameter_load): Skip static chain registers.
	* target.def (custom_function_descriptors): New POD hook.
	* tree.h (FUNC_ADDR_BY_DESCRIPTOR): New flag on ADDR_EXPR.
	(CALL_EXPR_BY_DESCRIPTOR): New flag on CALL_EXPR.
	* tree-core.h (ECF_BY_DESCRIPTOR): New mask.
	Document FUNC_ADDR_BY_DESCRIPTOR and CALL_EXPR_BY_DESCRIPTOR.
	* tree.c (make_node_stat) <tcc_declaration>: Set function alignment to
	DEFAULT_FUNCTION_ALIGNMENT instead of FUNCTION_BOUNDARY.
	(build_common_builtin_nodes): Initialize init_descriptor and
	adjust_descriptor.
	* tree-nested.c: Include target.h.
	(struct nesting_info): Add 'any_descr_created' field.
	(get_descriptor_type): New function.
	(lookup_element_for_decl): New function extracted from...
	(create_field_for_decl): Likewise.
	(lookup_tramp_for_decl): ...here.  Adjust.
	(lookup_descr_for_decl): New function.
	(convert_tramp_reference_op): Deal with descriptors.
	(build_init_call_stmt): New function extracted from...
	(finalize_nesting_tree_1): ...here.  Adjust and deal with descriptors.
	* defaults.h (DEFAULT_FUNCTION_ALIGNMENT): Define.
	(TRAMPOLINE_ALIGNMENT): Set to above instead of FUNCTION_BOUNDARY.
	* config/aarch64/aarch64.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS):Define
	* config/alpha/alpha.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Likewise.
	* config/arm/arm.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Likewise.
	* config/arm/arm.c (arm_function_ok_for_sibcall): Return false for an
	indirect call by descriptor if all the argument registers are used.
	* config/i386/i386.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Define.
	* config/ia64/ia64.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Likewise.
	* config/mips/mips.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Likewise.
	* config/pa/pa.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Likewise.
	* config/rs6000/rs6000.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS):Likewise
	* config/sparc/sparc.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Likewise.
ada/
	* gcc-interface/misc.c (LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS):Define
	* gcc-interface/trans.c (Attribute_to_gnu) <Attr_Access>: Deal with
	a zero  TARGET_CUSTOM_FUNCTION_DESCRIPTORSspecially for 'Code_Address.
	Otherwise, if TARGET_CUSTOM_FUNCTION_DESCRIPTORS is positive, set
	FUNC_ADDR_BY_DESCRIPTOR for 'Access/'Unrestricted_Access of nested
	subprograms if the type can use an internal representation.
	(call_to_gnu): Likewise, but set CALL_EXPR_BY_DESCRIPTOR on indirect
	calls if the type can use an internal representation.


2016-06-29  Eric Botcazou  <ebotcazou@adacore.com>

	* gnat.dg/trampoline3.adb: New test.
	* gnat.dg/trampoline4.adb: Likewise.

-- Eric Botcazou


p.diff


Index: builtins.c
===================================================================
--- builtins.c	(revision 237789)
+++ builtins.c	(working copy)
@@ -4644,6 +4645,57 @@ expand_builtin_adjust_trampoline (tree e
   return tramp;
 }

+/* Expand a call to the builtin descriptor initialization routine.
+   A descriptor is made up of a couple of pointers to the static
+   chain and the code entry in this order.  */
+
+static rtx
+expand_builtin_init_descriptor (tree exp)
+{
+  tree t_descr, t_func, t_chain;
+  rtx m_descr, r_descr, r_func, r_chain;
+
+  if (!validate_arglist (exp, POINTER_TYPE, POINTER_TYPE, POINTER_TYPE,
+			 VOID_TYPE))
+    return NULL_RTX;
+
+  t_descr = CALL_EXPR_ARG (exp, 0);
+  t_func = CALL_EXPR_ARG (exp, 1);
+  t_chain = CALL_EXPR_ARG (exp, 2);
+
+  r_descr = expand_normal (t_descr);
+  m_descr = gen_rtx_MEM (BLKmode, r_descr);
+  MEM_NOTRAP_P (m_descr) = 1;
+
+  r_func = expand_normal (t_func);
+  r_chain = expand_normal (t_chain);
+
+  /* Generate insns to initialize the descriptor.  */
+  emit_move_insn (adjust_address_nv (m_descr, Pmode, 0), r_chain);
+  emit_move_insn (adjust_address_nv (m_descr, Pmode, UNITS_PER_WORD), r_func);
Should UNITS_PER_WORD here be POINTER_SIZE/BITS_PER_UNIT right?


Index: calls.c
===================================================================
--- calls.c	(revision 237789)
+++ calls.c	(working copy)
@@ -183,18 +183,73 @@ static void restore_fixed_argument_area
-  else if (! sibcallp)
+    {
+      /* If it's an indirect call by descriptor, generate code to perform
+	 runtime identification of the pointer and load the descriptor.  */
+      if ((flags & ECF_BY_DESCRIPTOR) && !flag_trampolines)
+	{
+	  const int bit_val = targetm.calls.custom_function_descriptors;
+	  rtx call_lab = gen_label_rtx ();
+
+	  gcc_assert (fndecl_or_type && TYPE_P (fndecl_or_type));
+	  fndecl_or_type
+	    = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, NULL_TREE,
+			  fndecl_or_type);
+	  DECL_STATIC_CHAIN (fndecl_or_type) = 1;
+	  rtx chain = targetm.calls.static_chain (fndecl_or_type, false);
+
+	  /* Avoid long live ranges around function calls.  */
+	  funexp = copy_to_mode_reg (Pmode, funexp);
+
+	  if (REG_P (chain))
+	    emit_insn (gen_rtx_CLOBBER (VOIDmode, chain));
+
+	  /* Emit the runtime identification pattern.  */
+	  rtx mask = gen_rtx_AND (Pmode, funexp, GEN_INT (bit_val));
+	  emit_cmp_and_jump_insns (mask, const0_rtx, EQ, NULL_RTX, Pmode, 1,
+				   call_lab);
+
+	  /* Statically predict the branch to very likely taken.  */
+	  rtx_insn *insn = get_last_insn ();
+	  if (JUMP_P (insn))
+	    predict_insn_def (insn, PRED_BUILTIN_EXPECT, TAKEN);
+
+	  /* Load the descriptor.  */
+	  rtx mem = gen_rtx_MEM (Pmode,
+				 plus_constant (Pmode, funexp, - bit_val));
+	  MEM_NOTRAP_P (mem) = 1;
+	  emit_move_insn (chain, mem);
+	  mem = gen_rtx_MEM (Pmode,
+			     plus_constant (Pmode, funexp,
+					    UNITS_PER_WORD - bit_val));
Another UNITS_PER_WORD that I think ought to be POINTER_SIZE/BITS_PER_UNIT. Probably worth a pass over the patch to look for this throughout.






Index: common.opt
===================================================================
--- common.opt	(revision 237789)
+++ common.opt	(working copy)
@@ -2303,6 +2303,10 @@ ftracer
 Common Report Var(flag_tracer) Optimization
 Perform superblock formation via tail duplication.

+ftrampolines
+Common Report Var(flag_trampolines) Init(0)
+Always generate trampolines for pointers to nested functions
Even for targets that use procedure descriptors natively? I'm not sure what's the best choice for them -- I could argue either way on this issue.


Index: config/i386/i386.h
I'm probably not well versed enough in this aspect of the various targets's ABIs to review the target stuff. We'll want/need the target maintainers to chime in. That might be easier if the target bits were split out into their own patches.


Index: config/rs6000/rs6000.h
===================================================================
--- config/rs6000/rs6000.h	(revision 237789)
+++ config/rs6000/rs6000.h	(working copy)
@@ -2894,3 +2894,6 @@ extern GTY(()) tree rs6000_builtin_types
 extern GTY(()) tree rs6000_builtin_decls[RS6000_BUILTIN_COUNT];

 #define TARGET_SUPPORTS_WIDE_INT 1
+
+/* Use custom descriptors instead of trampolines when possible if not AIX.  */
+#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS (DEFAULT_ABI == ABI_AIX ? 0 : 1)
I'm not sure this is correcct for ppc64le, which my recollection doesn't use procedure descriptors.


Index: doc/tm.texi
===================================================================
--- doc/tm.texi	(revision 237789)
+++ doc/tm.texi	(working copy)
@@ -5181,6 +5181,25 @@ be returned; otherwise @var{addr} should
 If this hook is not defined, @var{addr} will be used for function calls.
 @end deftypefn

+@deftypevr {Target Hook} int TARGET_CUSTOM_FUNCTION_DESCRIPTORS
+This hook should be defined to a power of 2 if the target will benefit
+from the use of custom descriptors for nested functions instead of the
+standard trampolines.  Such descriptors are created at run time on the
+stack and made up of data only, but they are non-standard so the generated
+code must be prepared to deal with them.  This hook should be defined to 0
+if the target uses function descriptors for its standard calling sequence,
+like for example HP-PA or IA-64.  Using descriptors for nested functions
+eliminates the need for trampolines that reside on the stack and require
+it to be made executable.
+
+The value of the macro is used to parameterize the run-time identification
+scheme implemented to distinguish descriptors from function addresses: it
+gives the number of bytes by which their address is shifted in comparison
+with function addresses.  The value of 1 will generally work, unless it is
+already used by the target for a similar purpose, like for example on ARM
+where it is used to distinguish Thumb functions from ARM ones.
"number of bytes", don't you mean "number of bits" in the last paragraph?




Index: tree-nested.c
===================================================================
--- tree-nested.c	(revision 237789)
+++ tree-nested.c	(working copy)
@@ -21,6 +21,7 @@
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "target.h"
Not real happy to see target stuff getting into tree-*. Is this really needed?

I think the biggest issues are the ABI concerns and getting the target maintainers to take a looksie. Everything else looks pretty good.

Jeff



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]