[PATCH, MIPS] Frame header optimization for MIPS O32 ABI

Matthew Fortune Matthew.Fortune@imgtec.com
Fri Sep 4 09:16:00 GMT 2015


Steve Ellcey <Steve.Ellcey@imgtec.com> writes: 
> Here is an update of my MIPS frame header optimization patch.  This is
> actually only one part of the patch but I would like to get this approved
> and checked in before proceeding with the second half.
> 
> The O32 ABI on MIPS requires that calling functions allocate space on the
> stack for arguments that are passed in registers.  That way if the address
> of an argument passed in a register is needed, the called function can write
> it out to this space.  The new MIPS ABIs have the called functions allocate
> this space and they are unaffected by this patch.
> 
> This patch looks at what functions a function calls and if none of them
> use the allocated stack space to store arguments then the the calling function
> does not allocate this stack space.  In general this optimization is not going
> to save any time because the calling function will still need to allocate
> stack space for the return address if nothing else, but it does save space.
> Using a callers allocated space to save the return address if it is not
> needed for arguments will be the second part of this optimization.
> 
> There is one major restriction on when this optimization will not happen, and
> that is for PIC code.  There is something about accessing global variables
> in PIC code on MIPS, with its ghost instructions and global symbol accesses,
> that conflict with this optimization, so I skip it for PIC code.  I think it
> only needs to be skipped in functions where the global pointer register is
> saved and restored but we don't know which those are until very late in the
> compilation (thus the ghost instructions) and that is after we need to
> determine whether or not we can do this optimization.

OK. If it is ever desirable then considering what to do with PIC code later
sounds fine. I suppose we could eventually have it use the caller's argument
space to store GP even if GP is needed much like the next round of work will
store SP into that space.

> I did some testing with this optimization turned on by default at -O2 and
> did not have any regressions but this patch does not turn the option on
> by default at -O2 (or any other optimization level).  While that may be a
> reasonable thing to do, personally, I think I would like to get this checked
> in and have it available to more users before turning it on by default.

OK. I am keen to turn this on as soon as possible though. Perhaps stress it a
bit with a round of benchmarks etc and then get it turned on. We have several
months before next release so it is a reasonable time to get this kind of
thing in and enabled.

A few comments below. I found some of the comments a bit hard to parse but have
not attempted any rewording. I'd like Catherine to comment too as I have barely
any experience at the gimple level to know if this accounts for any necessary
subtleties.

> diff --git a/gcc/config/mips/frame-header-opt.c b/gcc/config/mips/frame-header-opt.c
> index e69de29..5db5385 100644
> --- a/gcc/config/mips/frame-header-opt.c
> +++ b/gcc/config/mips/frame-header-opt.c
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "cfghooks.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "rtl.h"
> +#include "df.h"
> +#include "regs.h"
> +#include "insn-config.h"
> +#include "conditions.h"
> +#include "insn-attr.h"
> +#include "recog.h"
> +#include "output.h"
> +#include "alias.h"
> +#include "fold-const.h"
> +#include "varasm.h"
> +#include "stringpool.h"
> +#include "stor-layout.h"
> +#include "calls.h"
> +#include "flags.h"
> +#include "expmed.h"
> +#include "dojump.h"
> +#include "explow.h"
> +#include "emit-rtl.h"
> +#include "stmt.h"
> +#include "expr.h"
> +#include "insn-codes.h"
> +#include "optabs.h"
> +#include "libfuncs.h"
> +#include "reload.h"
> +#include "tm_p.h"
> +#include "gstab.h"
> +#include "debug.h"
> +#include "target.h"
> +#include "common/common-target.h"
> +#include "langhooks.h"
> +#include "cfgrtl.h"
> +#include "cfganal.h"
> +#include "lcm.h"
> +#include "cfgbuild.h"
> +#include "cfgcleanup.h"
> +#include "sched-int.h"
> +#include "internal-fn.h"
> +#include "gimple-fold.h"
> +#include "tree-eh.h"
> +#include "gimplify.h"
> +#include "diagnostic.h"
> +#include "target-globals.h"
> +#include "opts.h"
> +#include "tree-pass.h"
> +#include "context.h"
> +#include "cgraph.h"
> +#include "builtins.h"
> +#include "rtl-iter.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "gimple-walk.h"
> +#include "print-tree.h"
> +#include "function.h"

This might be worth reducing a bit. I'm not sure all of these look
necessary.

> +/* Return true if this function will use the stack space allocated by its
> +   caller or if we cannot determine for certain that it does not.  */
> +
> +static bool
> +needs_stack_space (function *fn)

This is about the frame header rather than stack space. I think it would
allow a leaf function to use the stack as long as it does not use the
argument space. needs_frame_header_p

> +{
> +  tree t;
> +
> +  if (fn->decl == NULL)
> +    return true;
> +
> +  if (fn->stdarg || !is_leaf_function (fn))
> +    return true;

So non-leaf functions are assumed to need their argument space for now at
least.

> +
> +  for (t = DECL_ARGUMENTS (fn->decl); t; t = TREE_CHAIN (t))
> +    {
> +      if (!use_register_for_decl (t))
> +	  return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Look at all the functions this function calls and return true if none of
> +   them need the argument stack space that this function would normally
> +   allocate.  Return false if one ore more functions does need this space

ore - or.

> +   or if we cannot determine that all called functions do not need the
> +   space.  */
> +
> +static bool
> +called_functions_use_stack(function *fn)

space before bracket. callees_use_frame_header_p?

> +{
> +  basic_block bb;
> +  gimple_stmt_iterator gsi;
> +  gimple stmt;
> +  tree called_fn_tree;
> +  function *called_fn;
> +
> +  if (fn->cfg == NULL)
> +    return true;
> +
> +  FOR_EACH_BB_FN (bb, fn)
> +    {
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  stmt = gsi_stmt (gsi);
> +	  if (is_gimple_call (stmt))
> +	    {
> +	      called_fn_tree = gimple_call_fndecl (stmt);
> +	      if (called_fn_tree != NULL)

I don't know what it means for a gimple_call to not have a fndecl. Do you know
if it is safe to assume the callee will not use the frame header space in this
case?

Calling weakly defined functions should be seen as always using the frame header.

> +	        {
> +	          called_fn = DECL_STRUCT_FUNCTION (called_fn_tree);
> +		  if (called_fn == NULL ||
> +		      !called_fn->machine->does_not_use_parm_stack_space)

parm_stack_space is another term for frame_header and just 'stack' used elsewhere.
Could you use one term throughout to relate to the area?

> +		    return true;
> +	        }
> +            }
> +        }
> +    }
> +  return false;
> +}
> +
> +/* This optimization scans all the functions in the compilation unit to find
> +   out which ones do not need the stack space that their caller normally
> +   allocates.  Then it does a second scan of all the functions to determine
> +   which functions can skip the allocation because none of the functions it
> +   calls need the argument stack space.  */
> +
> +static unsigned int
> +frame_header_opt ()

Use (void).

> +{
> +  struct cgraph_node *node;
> +  function *fn;
> +
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    {
> +      fn = node->get_fun ();
> +      if (fn != NULL)
> +	fn->machine->does_not_use_parm_stack_space = !needs_stack_space (fn);
> +    }
> +
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    {
> +      fn = node->get_fun ();
> +      if (fn != NULL)
> +        fn->machine->optimize_call_stack = !called_functions_use_stack (fn);
> +    }
> +  return 0;
> +}
> diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
> index d9ad910..d506480 100644
> --- a/gcc/config/mips/mips-protos.h
> +++ b/gcc/config/mips/mips-protos.h
> @@ -368,4 +368,6 @@ typedef rtx (*mulsidi3_gen_fn) (rtx, rtx, rtx);
>  extern mulsidi3_gen_fn mips_mulsidi3_gen_fn (enum rtx_code);
>  #endif
> 
> +extern void mips_register_frame_header_opt ();

(void)

> +
>  #endif /* ! GCC_MIPS_PROTOS_H */
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 0b4a5fa..5a50883 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -327,153 +327,6 @@ static struct {
>    bool fast_mult_zero_zero_p;
>  } mips_tuning_info;
> 
> -/* Information about a function's frame layout.  */
> -struct GTY(())  mips_frame_info {
> -  /* The size of the frame in bytes.  */
> -  HOST_WIDE_INT total_size;
> -
> -  /* The number of bytes allocated to variables.  */
> -  HOST_WIDE_INT var_size;
> -
> -  /* The number of bytes allocated to outgoing function arguments.  */
> -  HOST_WIDE_INT args_size;
> -
> -  /* The number of bytes allocated to the .cprestore slot, or 0 if there
> -     is no such slot.  */
> -  HOST_WIDE_INT cprestore_size;
> -
> -  /* Bit X is set if the function saves or restores GPR X.  */
> -  unsigned int mask;
> -
> -  /* Likewise FPR X.  */
> -  unsigned int fmask;
> -
> -  /* Likewise doubleword accumulator X ($acX).  */
> -  unsigned int acc_mask;
> -
> -  /* The number of GPRs, FPRs, doubleword accumulators and COP0
> -     registers saved.  */
> -  unsigned int num_gp;
> -  unsigned int num_fp;
> -  unsigned int num_acc;
> -  unsigned int num_cop0_regs;
> -
> -  /* The offset of the topmost GPR, FPR, accumulator and COP0-register
> -     save slots from the top of the frame, or zero if no such slots are
> -     needed.  */
> -  HOST_WIDE_INT gp_save_offset;
> -  HOST_WIDE_INT fp_save_offset;
> -  HOST_WIDE_INT acc_save_offset;
> -  HOST_WIDE_INT cop0_save_offset;
> -
> -  /* Likewise, but giving offsets from the bottom of the frame.  */
> -  HOST_WIDE_INT gp_sp_offset;
> -  HOST_WIDE_INT fp_sp_offset;
> -  HOST_WIDE_INT acc_sp_offset;
> -  HOST_WIDE_INT cop0_sp_offset;
> -
> -  /* Similar, but the value passed to _mcount.  */
> -  HOST_WIDE_INT ra_fp_offset;
> -
> -  /* The offset of arg_pointer_rtx from the bottom of the frame.  */
> -  HOST_WIDE_INT arg_pointer_offset;
> -
> -  /* The offset of hard_frame_pointer_rtx from the bottom of the frame.  */
> -  HOST_WIDE_INT hard_frame_pointer_offset;
> -};
> -
> -/* Enumeration for masked vectored (VI) and non-masked (EIC) interrupts.  */
> -enum mips_int_mask
> -{
> -  INT_MASK_EIC = -1,
> -  INT_MASK_SW0 = 0,
> -  INT_MASK_SW1 = 1,
> -  INT_MASK_HW0 = 2,
> -  INT_MASK_HW1 = 3,
> -  INT_MASK_HW2 = 4,
> -  INT_MASK_HW3 = 5,
> -  INT_MASK_HW4 = 6,
> -  INT_MASK_HW5 = 7
> -};
> -
> -/* Enumeration to mark the existence of the shadow register set.
> -   SHADOW_SET_INTSTACK indicates a shadow register set with a valid stack
> -   pointer.  */
> -enum mips_shadow_set
> -{
> -  SHADOW_SET_NO,
> -  SHADOW_SET_YES,
> -  SHADOW_SET_INTSTACK
> -};
> -
> -struct GTY(())  machine_function {
> -  /* The next floating-point condition-code register to allocate
> -     for ISA_HAS_8CC targets, relative to ST_REG_FIRST.  */
> -  unsigned int next_fcc;
> -
> -  /* The register returned by mips16_gp_pseudo_reg; see there for details.  */
> -  rtx mips16_gp_pseudo_rtx;
> -
> -  /* The number of extra stack bytes taken up by register varargs.
> -     This area is allocated by the callee at the very top of the frame.  */
> -  int varargs_size;
> -
> -  /* The current frame information, calculated by mips_compute_frame_info.  */
> -  struct mips_frame_info frame;
> -
> -  /* The register to use as the function's global pointer, or INVALID_REGNUM
> -     if the function doesn't need one.  */
> -  unsigned int global_pointer;
> -
> -  /* How many instructions it takes to load a label into $AT, or 0 if
> -     this property hasn't yet been calculated.  */
> -  unsigned int load_label_num_insns;
> -
> -  /* True if mips_adjust_insn_length should ignore an instruction's
> -     hazard attribute.  */
> -  bool ignore_hazard_length_p;
> -
> -  /* True if the whole function is suitable for .set noreorder and
> -     .set nomacro.  */
> -  bool all_noreorder_p;
> -
> -  /* True if the function has "inflexible" and "flexible" references
> -     to the global pointer.  See mips_cfun_has_inflexible_gp_ref_p
> -     and mips_cfun_has_flexible_gp_ref_p for details.  */
> -  bool has_inflexible_gp_insn_p;
> -  bool has_flexible_gp_insn_p;
> -
> -  /* True if the function's prologue must load the global pointer
> -     value into pic_offset_table_rtx and store the same value in
> -     the function's cprestore slot (if any).  Even if this value
> -     is currently false, we may decide to set it to true later;
> -     see mips_must_initialize_gp_p () for details.  */
> -  bool must_initialize_gp_p;
> -
> -  /* True if the current function must restore $gp after any potential
> -     clobber.  This value is only meaningful during the first post-epilogue
> -     split_insns pass; see mips_must_initialize_gp_p () for details.  */
> -  bool must_restore_gp_when_clobbered_p;
> -
> -  /* True if this is an interrupt handler.  */
> -  bool interrupt_handler_p;
> -
> -  /* Records the way in which interrupts should be masked.  Only used if
> -     interrupts are not kept masked.  */
> -  enum mips_int_mask int_mask;
> -
> -  /* Records if this is an interrupt handler that uses shadow registers.  */
> -  enum mips_shadow_set use_shadow_register_set;
> -
> -  /* True if this is an interrupt handler that should keep interrupts
> -     masked.  */
> -  bool keep_interrupts_masked_p;
> -
> -  /* True if this is an interrupt handler that should use DERET
> -     instead of ERET.  */
> -  bool use_debug_exception_return_p;
> -};
> -
>  /* Information about a single argument.  */
>  struct mips_arg_info {
>    /* True if the argument is passed in a floating-point register, or
> @@ -10462,10 +10315,15 @@ mips_compute_frame_info (void)
>    cfun->machine->global_pointer = mips_global_pointer ();
> 
>    /* The first two blocks contain the outgoing argument area and the $gp save
> -     slot.  This area isn't needed in leaf functions, but if the
> -     target-independent frame size is nonzero, we have already committed to
> -     allocating these in STARTING_FRAME_OFFSET for !FRAME_GROWS_DOWNWARD.  */
> -  if ((size == 0 || FRAME_GROWS_DOWNWARD) && crtl->is_leaf)
> +     slot.  This area isn't needed in leaf functions.  We can also skip it
> +     if we know that none of the called functions will use this space.
> +
> +     But if the target-independent frame size is nonzero, we have already
> +     committed to allocating these in STARTING_FRAME_OFFSET for
> +     !FRAME_GROWS_DOWNWARD.  */
> +
> +  if ((size == 0 || FRAME_GROWS_DOWNWARD)
> +      && (crtl->is_leaf || (cfun->machine->optimize_call_stack && !flag_pic)))
>      {
>        /* The MIPS 3.0 linker does not like functions that dynamically
>  	 allocate the stack and have 0 for STACK_DYNAMIC_OFFSET, since it
> @@ -17980,6 +17838,8 @@ mips_option_override (void)
> 
>    if (TARGET_HARD_FLOAT_ABI && TARGET_MIPS5900)
>      REAL_MODE_FORMAT (SFmode) = &spu_single_format;
> +
> +  mips_register_frame_header_opt ();
>  }
> 
>  /* Swap the register information for registers I and I + 1, which
> diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
> index 2d44735..72ee214 100644
> --- a/gcc/config/mips/mips.h
> +++ b/gcc/config/mips/mips.h
> @@ -3124,6 +3124,161 @@ extern const struct mips_cpu_info *mips_tune_info;
>  extern unsigned int mips_base_compression_flags;
>  extern GTY(()) struct target_globals *mips16_globals;
>  extern GTY(()) struct target_globals *micromips_globals;
> +
> +/* Information about a function's frame layout.  */
> +struct GTY(())  mips_frame_info {
> +  /* The size of the frame in bytes.  */
> +  HOST_WIDE_INT total_size;
> +
> +  /* The number of bytes allocated to variables.  */
> +  HOST_WIDE_INT var_size;
> +
> +  /* The number of bytes allocated to outgoing function arguments.  */
> +  HOST_WIDE_INT args_size;
> +
> +  /* The number of bytes allocated to the .cprestore slot, or 0 if there
> +     is no such slot.  */
> +  HOST_WIDE_INT cprestore_size;
> +
> +  /* Bit X is set if the function saves or restores GPR X.  */
> +  unsigned int mask;
> +
> +  /* Likewise FPR X.  */
> +  unsigned int fmask;
> +
> +  /* Likewise doubleword accumulator X ($acX).  */
> +  unsigned int acc_mask;
> +
> +  /* The number of GPRs, FPRs, doubleword accumulators and COP0
> +     registers saved.  */
> +  unsigned int num_gp;
> +  unsigned int num_fp;
> +  unsigned int num_acc;
> +  unsigned int num_cop0_regs;
> +
> +  /* The offset of the topmost GPR, FPR, accumulator and COP0-register
> +     save slots from the top of the frame, or zero if no such slots are
> +     needed.  */
> +  HOST_WIDE_INT gp_save_offset;
> +  HOST_WIDE_INT fp_save_offset;
> +  HOST_WIDE_INT acc_save_offset;
> +  HOST_WIDE_INT cop0_save_offset;
> +
> +  /* Likewise, but giving offsets from the bottom of the frame.  */
> +  HOST_WIDE_INT gp_sp_offset;
> +  HOST_WIDE_INT fp_sp_offset;
> +  HOST_WIDE_INT acc_sp_offset;
> +  HOST_WIDE_INT cop0_sp_offset;
> +
> +  /* Similar, but the value passed to _mcount.  */
> +  HOST_WIDE_INT ra_fp_offset;
> +
> +  /* The offset of arg_pointer_rtx from the bottom of the frame.  */
> +  HOST_WIDE_INT arg_pointer_offset;
> +
> +  /* The offset of hard_frame_pointer_rtx from the bottom of the frame.  */
> +  HOST_WIDE_INT hard_frame_pointer_offset;
> +};
> +
> +/* Enumeration for masked vectored (VI) and non-masked (EIC) interrupts.  */
> +enum mips_int_mask
> +{
> +  INT_MASK_EIC = -1,
> +  INT_MASK_SW0 = 0,
> +  INT_MASK_SW1 = 1,
> +  INT_MASK_HW0 = 2,
> +  INT_MASK_HW1 = 3,
> +  INT_MASK_HW2 = 4,
> +  INT_MASK_HW3 = 5,
> +  INT_MASK_HW4 = 6,
> +  INT_MASK_HW5 = 7
> +};
> +
> +/* Enumeration to mark the existence of the shadow register set.
> +   SHADOW_SET_INTSTACK indicates a shadow register set with a valid stack
> +   pointer.  */
> +enum mips_shadow_set
> +{
> +  SHADOW_SET_NO,
> +  SHADOW_SET_YES,
> +  SHADOW_SET_INTSTACK
> +};
> +
> +struct GTY(())  machine_function {
> +  /* The next floating-point condition-code register to allocate
> +     for ISA_HAS_8CC targets, relative to ST_REG_FIRST.  */
> +  unsigned int next_fcc;
> +
> +  /* The register returned by mips16_gp_pseudo_reg; see there for details.  */
> +  rtx mips16_gp_pseudo_rtx;
> +
> +  /* The number of extra stack bytes taken up by register varargs.
> +     This area is allocated by the callee at the very top of the frame.  */
> +  int varargs_size;
> +
> +  /* The current frame information, calculated by mips_compute_frame_info.  */
> +  struct mips_frame_info frame;
> +
> +  /* The register to use as the function's global pointer, or INVALID_REGNUM
> +     if the function doesn't need one.  */
> +  unsigned int global_pointer;
> +
> +  /* How many instructions it takes to load a label into $AT, or 0 if
> +     this property hasn't yet been calculated.  */
> +  unsigned int load_label_num_insns;
> +
> +  /* True if mips_adjust_insn_length should ignore an instruction's
> +     hazard attribute.  */
> +  bool ignore_hazard_length_p;
> +
> +  /* True if the whole function is suitable for .set noreorder and
> +     .set nomacro.  */
> +  bool all_noreorder_p;
> +
> +  /* True if the function has "inflexible" and "flexible" references
> +     to the global pointer.  See mips_cfun_has_inflexible_gp_ref_p
> +     and mips_cfun_has_flexible_gp_ref_p for details.  */
> +  bool has_inflexible_gp_insn_p;
> +  bool has_flexible_gp_insn_p;
> +
> +  /* True if the function's prologue must load the global pointer
> +     value into pic_offset_table_rtx and store the same value in
> +     the function's cprestore slot (if any).  Even if this value
> +     is currently false, we may decide to set it to true later;
> +     see mips_must_initialize_gp_p () for details.  */
> +  bool must_initialize_gp_p;
> +
> +  /* True if the current function must restore $gp after any potential
> +     clobber.  This value is only meaningful during the first post-epilogue
> +     split_insns pass; see mips_must_initialize_gp_p () for details.  */
> +  bool must_restore_gp_when_clobbered_p;
> +
> +  /* True if this is an interrupt handler.  */
> +  bool interrupt_handler_p;
> +
> +  /* Records the way in which interrupts should be masked.  Only used if
> +     interrupts are not kept masked.  */
> +  enum mips_int_mask int_mask;
> +
> +  /* Records if this is an interrupt handler that uses shadow registers.  */
> +  enum mips_shadow_set use_shadow_register_set;
> +
> +  /* True if this is an interrupt handler that should keep interrupts
> +     masked.  */
> +  bool keep_interrupts_masked_p;
> +
> +  /* True if this is an interrupt handler that should use DERET
> +     instead of ERET.  */
> +  bool use_debug_exception_return_p;
> +
> +  /* True if at least one of the formal parameters to a function must be
> +     written to the stack (probably so its address can be taken).  */
> +  bool does_not_use_parm_stack_space;
> +
> +  /* True if none of the functions that are called by this function need
> +     stack space allocated for their arguments.  */
> +  bool optimize_call_stack;
> +};
>  #endif
> 
>  /* Enable querying of DFA units.  */
> diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
> index 348c6e0..3e72936 100644
> --- a/gcc/config/mips/mips.opt
> +++ b/gcc/config/mips/mips.opt
> @@ -412,6 +412,10 @@ modd-spreg
>  Target Report Mask(ODD_SPREG)
>  Enable use of odd-numbered single-precision registers
> 
> +mframe-header-opt
> +Target Report Var(flag_frame_header_optimization) Optimization
> +Optimize frame header
> +
>  noasmopt
>  Driver
> 
> diff --git a/gcc/config/mips/t-mips b/gcc/config/mips/t-mips
> index 01df1ad..a893841 100644
> --- a/gcc/config/mips/t-mips
> +++ b/gcc/config/mips/t-mips
> @@ -20,3 +20,7 @@ $(srcdir)/config/mips/mips-tables.opt: $(srcdir)/config/mips/genopt.sh \
>    $(srcdir)/config/mips/mips-cpus.def
>  	$(SHELL) $(srcdir)/config/mips/genopt.sh $(srcdir)/config/mips > \
>  		$(srcdir)/config/mips/mips-tables.opt
> +
> +frame-header-opt.o: $(srcdir)/config/mips/frame-header-opt.c
> +	$(COMPILE) $<
> +	$(POSTCOMPILE)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c0ec0fd..1e42a43 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -814,7 +814,8 @@ Objective-C and Objective-C++ Dialects}.
>  -mbranch-cost=@var{num}  -mbranch-likely  -mno-branch-likely @gol
>  -mfp-exceptions -mno-fp-exceptions @gol
>  -mvr4130-align -mno-vr4130-align -msynci -mno-synci @gol
> --mrelax-pic-calls -mno-relax-pic-calls -mmcount-ra-address}
> +-mrelax-pic-calls -mno-relax-pic-calls -mmcount-ra-address @gol
> +-mframe-header-opt -mno-frame-header-opt}
> 
>  @emph{MMIX Options}
>  @gccoptlist{-mlibfuncs  -mno-libfuncs  -mepsilon  -mno-epsilon  -mabi=gnu @gol
> @@ -17940,6 +17941,18 @@ if @var{ra-address} is nonnull.
> 
>  The default is @option{-mno-mcount-ra-address}.
> 
> +@item -mframe-header-opt
> +@itemx -mno-frame-header-opt
> +@opindex mframe-header-opt
> +Enable (disable) frame header optimization in the O32 ABI.  When using

ABIs in documentation use lower case 'o'. i.e. o32.

> +the O32 ABI, calling functions allocate 16 bytes on the stack in case
> +the called function needs to write out register arguments to memory so
> +that their address can be taken.  When enabled, this optimization will
> +cause the calling function to not allocate that space if it can determine
> +that none of its called functions use it.
> +
> +This optimization is off by default at all optimization levels.
> +
>  @end table
> 
>  @node MMIX Options
> 

Can you add a test to show the reduced stack allocation? Something like:

void __attribute__((noinline))
bar (int* a)
{
  *a = 1;
}

void
foo (int a)
{
  bar (&a);
}

That should guarantee foo does not put RA into the frame header when you do
the next part of this optimisation and the call to bar should not need a
frame header allocating.

The same test with frame header optimisation off should show the frame header
being allocated.

The same test with bar defined as weak should also show the frame header being
allocated.

Thanks,
Matthew



More information about the Gcc-patches mailing list