[PATCH] Improve tail call analysis and inliner EH clobber through variable life analysis (PR tree-optimization/89060, take 2)

Richard Biener rguenther@suse.de
Wed May 8 13:51:00 GMT 2019


On Wed, 8 May 2019, Jakub Jelinek wrote:

> On Mon, May 06, 2019 at 04:17:01PM +0200, Richard Biener wrote:
> > > > +struct compute_live_vars_data {
> > > > +  /* Vector of bitmaps for live vars at the end of basic blocks,
> > > > +     indexed by bb->index.  ACTIVE[ENTRY_BLOCK] must be empty bitmap,
> > > > +     ACTIVE[EXIT_BLOCK] is used for STOP_AFTER.  */
> > > > +  vec<bitmap_head> active;
> > > > +  /* Work bitmap of currently live variables.  */
> > > > +  bitmap work;
> > > > +  /* Bitmap of interesting variables.  Variables with uids not in this
> > > > +     bitmap are not tracked.  */
> > > > +  bitmap vars;
> > > > +};
> > 
> > How dense are the bitmaps?  Storage requirement will be quadratic
> > so eventually using a sparseset for 'vars' and using the sparseset
> > index for the bitmaps will improve things?  Or re-using live
> > code mapping?
> 
> Here is an updated patch that instead of using a bitmap of vars uses a
> hash_map from DECL_UID to some dense indexes used in the rest of the
> bitmaps.
> 
> > > > +	  tree lhs = gimple_assign_lhs (stmt);
> > > > +	  if (VAR_P (lhs) && bitmap_bit_p (data->vars, DECL_UID (lhs)))
> > > > +	    bitmap_clear_bit (data->work, DECL_UID (lhs));
> > 
> > I think this clearing causes PR90348.
> 
> This stuff keeps as is, can be changed depending on what we decide for
> PR90348.
> 
> > > > +  /* If any such variables are found, try harder using variable life
> > > > +     tracking to see if those variables aren't already out of scope.  */
> > > > +  if (local_decls)
> > > > +    {
> > > > +      vec<bitmap_head> live = compute_live_vars (cfun, local_decls, call);
> > > > +      bool any_live_p = !bitmap_empty_p (&live[EXIT_BLOCK]);
> > > > +      destroy_live_vars (live);
> > > > +      BITMAP_FREE (local_decls);
> > > > +      if (any_live_p)
> > > >  	return;
> > > >      }
> > 
> > I wonder if it is possible to split analysis and transform here
> > some more - the above is called for all preds of EXIT, if we
> > first analyze all of them and then compute live once if needed,
> > pruning invalid ones and then doing transform this would at least
> > make LIVE only computed once.
> > 
> > It should be also possible to restrict LIVE to the sub-CFG leading to
> > the EXIT preds with tail-call candidates as well, no?
> 
> And here it lazily computes the analysis the first time we need it in a
> function for all addressable vars in the function and then uses it for that
> and all following tail call locations, where it just walks the last bb until
> after the call stmt.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I'm still worried about complexity - for the inliner case we compute
liveness for the whole destination function for each function with
an EH landing pad we expand inline.  It seems to me the liveness
should be computed on the callee function body, not the caller
after inlining?  Is it really worth optimizing the number of clobbers
here with this potentially expensive work?

For the tailcall case consider a function like

foo ()
{
  T a, b, ....;
  if (..) baz(&a);
  if (..) baz(&b);
...
  bar(&a,&b,&c...);
}

or similar, that is, a lot of decls we need to track, a lot of BBs
and a lot of tailcall opportunities with the case of all decls being
live throughout the whole function.

-foptimize-sibling-calls is only enabled at -O2 but the inlining
part at -O1 where we hit those huge auto-generated testcases.

Richard.

> 2019-05-08  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/89060
> 	* tree-ssa-live.h (live_vars_map): New typedef.
> 	(compute_live_vars, live_vars_at_stmt, destroy_live_vars): Declare.
> 	* tree-ssa-live.c: Include gimple-walk.h and cfganal.h.
> 	(struct compute_live_vars_data): New type.
> 	(compute_live_vars_visit, compute_live_vars_1, compute_live_vars,
> 	live_vars_at_stmt, destroy_live_vars): New functions.
> 	* tree-tailcall.c: Include tree-ssa-live.h.
> 	(live_vars, live_vars_vec): New global variables.
> 	(find_tail_calls): Perform variable life analysis before punting.
> 	(tree_optimize_tail_calls_1): Clean up live_vars and live_vars_vec.
> 	* tree-inline.h (struct copy_body_data): Add eh_landing_pad_dest
> 	member.
> 	* tree-inline.c (add_clobbers_to_eh_landing_pad): Remove BB argument.
> 	Perform variable life analysis to select variables that really need
> 	clobbers added.
> 	(copy_edges_for_bb): Don't call add_clobbers_to_eh_landing_pad here,
> 	instead set id->eh_landing_pad_dest and assert it is the same.
> 	(copy_cfg_body): Call it here if id->eh_landing_pad_dest is non-NULL.
> 
> 	* gcc.dg/tree-ssa/pr89060.c: New test.
> 
> --- gcc/tree-ssa-live.h.jj	2019-02-08 08:10:16.954053427 +0100
> +++ gcc/tree-ssa-live.h	2019-05-07 17:43:00.369174630 +0200
> @@ -266,6 +266,11 @@ extern void debug (tree_live_info_d &ref
>  extern void debug (tree_live_info_d *ptr);
>  extern void dump_live_info (FILE *, tree_live_info_p, int);
>  
> +typedef hash_map<int_hash <unsigned int, -1U>, unsigned int> live_vars_map;
> +extern vec<bitmap_head> compute_live_vars (struct function *, live_vars_map *);
> +extern bitmap live_vars_at_stmt (vec<bitmap_head> &, live_vars_map *,
> +				 gimple *);
> +extern void destroy_live_vars (vec<bitmap_head> &);
>  
>  /*  Return TRUE if P is marked as a global in LIVE.  */
>  
> --- gcc/tree-ssa-live.c.jj	2019-02-08 08:10:16.902054295 +0100
> +++ gcc/tree-ssa-live.c	2019-05-07 17:45:25.775844167 +0200
> @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "optinfo.h"
> +#include "gimple-walk.h"
> +#include "cfganal.h"
>  
>  static void verify_live_on_entry (tree_live_info_p);
>  
> @@ -1194,8 +1196,149 @@ calculate_live_ranges (var_map map, bool
>  
>    return live;
>  }
> +

> +/* Data structure for compute_live_vars* functions.  */
>  
> +struct compute_live_vars_data {
> +  /* Vector of bitmaps for live vars indices at the end of basic blocks,
> +     indexed by bb->index.  ACTIVE[ENTRY_BLOCK] must be empty bitmap,
> +     ACTIVE[EXIT_BLOCK] is used for STOP_AFTER.  */
> +  vec<bitmap_head> active;
> +  /* Work bitmap of currently live variables.  */
> +  bitmap work;
> +  /* Set of interesting variables.  Variables with uids not in this
> +     hash_map are not tracked.  */
> +  live_vars_map *vars;
> +};
> +
> +/* Callback for walk_stmt_load_store_addr_ops.  If OP is a VAR_DECL with
> +   uid set in DATA->vars, enter its corresponding index into bitmap
> +   DATA->work.  */
>  
> +static bool
> +compute_live_vars_visit (gimple *, tree op, tree, void *pdata)
> +{
> +  compute_live_vars_data *data = (compute_live_vars_data *) pdata;
> +  op = get_base_address (op);
> +  if (op && VAR_P (op))
> +    if (unsigned int *v = data->vars->get (DECL_UID (op)))
> +      bitmap_set_bit (data->work, *v);
> +  return false;
> +}
> +
> +/* Helper routine for compute_live_vars, calculating the sets of live
> +   variables at the end of BB, leaving the result in DATA->work.
> +   If STOP_AFTER is non-NULL, stop processing after that stmt.  */
> +
> +static void
> +compute_live_vars_1 (basic_block bb, compute_live_vars_data *data,
> +		     gimple *stop_after)
> +{
> +  edge e;
> +  edge_iterator ei;
> +  gimple_stmt_iterator gsi;
> +  walk_stmt_load_store_addr_fn visit = compute_live_vars_visit;
> +
> +  bitmap_clear (data->work);
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +    bitmap_ior_into (data->work, &data->active[e->src->index]);
> +
> +  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    walk_stmt_load_store_addr_ops (gsi_stmt (gsi), data, NULL, NULL, visit);
> +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple *stmt = gsi_stmt (gsi);
> +
> +      if (gimple_clobber_p (stmt))
> +	{
> +	  tree lhs = gimple_assign_lhs (stmt);
> +	  if (VAR_P (lhs))
> +	    if (unsigned int *v = data->vars->get (DECL_UID (lhs)))
> +	      bitmap_clear_bit (data->work, *v);
> +	}
> +      else if (!is_gimple_debug (stmt))
> +	walk_stmt_load_store_addr_ops (stmt, data, visit, visit, visit);
> +      if (stmt == stop_after)
> +	break;
> +    }
> +}
> +
> +/* For function FN and live_vars_map (hash map from DECL_UIDs to a dense set of
> +   indexes of automatic variables VARS, compute which of those variables are
> +   (might be) live at the end of each basic block.  */
> +
> +vec<bitmap_head>
> +compute_live_vars (struct function *fn, live_vars_map *vars)
> +{
> +  vec<bitmap_head> active;
> +
> +  /* We approximate the live range of a stack variable by taking the first
> +     mention of its name as starting point(s), and by the end-of-scope
> +     death clobber added by gimplify as ending point(s) of the range.
> +     This overapproximates in the case we for instance moved an address-taken
> +     operation upward, without also moving a dereference to it upwards.
> +     But it's conservatively correct as a variable never can hold values
> +     before its name is mentioned at least once.
> +
> +     We then do a mostly classical bitmap liveness algorithm.  */
> +
> +  active.create (last_basic_block_for_fn (fn));
> +  active.quick_grow (last_basic_block_for_fn (fn));
> +  for (int i = 0; i < last_basic_block_for_fn (fn); i++)
> +    bitmap_initialize (&active[i], &bitmap_default_obstack);
> +
> +  bitmap work = BITMAP_ALLOC (NULL);
> +
> +  int *rpo = XNEWVEC (int, last_basic_block_for_fn (fn));
> +  int n_bbs = pre_and_rev_post_order_compute_fn (fn, NULL, rpo, false);
> +
> +  bool changed = true;
> +  compute_live_vars_data data = { active, work, vars };
> +  while (changed)
> +    {
> +      int i;
> +      changed = false;
> +      for (i = 0; i < n_bbs; i++)
> +	{
> +	  basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> +	  compute_live_vars_1 (bb, &data, NULL);
> +	  if (bitmap_ior_into (&active[bb->index], work))
> +	    changed = true;
> +	}
> +    }
> +
> +  free (rpo);
> +  BITMAP_FREE (work);
> +
> +  return active;
> +}
> +
> +/* For ACTIVE computed by compute_live_vars, compute a bitmap of variables
> +   live after the STOP_AFTER statement and return that bitmap.  */
> +
> +bitmap
> +live_vars_at_stmt (vec<bitmap_head> &active, live_vars_map *vars,
> +		   gimple *stop_after)
> +{
> +  bitmap work = BITMAP_ALLOC (NULL);
> +  compute_live_vars_data data = { active, work, vars };
> +  basic_block bb = gimple_bb (stop_after);
> +  compute_live_vars_1 (bb, &data, stop_after);
> +  return work;
> +}
> +
> +/* Destroy what compute_live_vars has returned when it is no longer needed.  */
> +
> +void
> +destroy_live_vars (vec<bitmap_head> &active)
> +{
> +  unsigned len = active.length ();
> +  for (unsigned i = 0; i < len; i++)
> +    bitmap_clear (&active[i]);
> +
> +  active.release ();
> +}
> +

>  /* Output partition map MAP to file F.  */
>  
>  void
> --- gcc/tree-tailcall.c.jj	2019-04-30 11:03:52.755208800 +0200
> +++ gcc/tree-tailcall.c	2019-05-07 17:43:25.257775736 +0200
> @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.
>  #include "cfgloop.h"
>  #include "common/common-target.h"
>  #include "ipa-utils.h"
> +#include "tree-ssa-live.h"
>  
>  /* The file implements the tail recursion elimination.  It is also used to
>     analyze the tail calls in general, passing the results to the rtl level
> @@ -392,6 +393,11 @@ propagate_through_phis (tree var, edge e
>    return var;
>  }
>  
> +/* Argument for compute_live_vars/live_vars_at_stmt and what compute_live_vars
> +   returns.  Computed lazily, but just once for the function.  */
> +static live_vars_map *live_vars;
> +static vec<bitmap_head> live_vars_vec;
> +
>  /* Finds tailcalls falling into basic block BB. The list of found tailcalls is
>     added to the start of RET.  */
>  
> @@ -519,6 +525,29 @@ find_tail_calls (basic_block bb, struct
>  	tail_recursion = true;
>      }
>  
> +  /* Compute live vars if not computed yet.  */
> +  if (live_vars == NULL)
> +    {
> +      unsigned int cnt = 0;
> +      FOR_EACH_LOCAL_DECL (cfun, idx, var)
> +	if (VAR_P (var)
> +	    && auto_var_in_fn_p (var, cfun->decl)
> +	    && may_be_aliased (var))
> +	  {
> +	    if (live_vars == NULL)
> +	      live_vars = new live_vars_map;
> +	    live_vars->put (DECL_UID (var), cnt++);
> +	  }
> +      if (live_vars)
> +	live_vars_vec = compute_live_vars (cfun, live_vars);
> +    }
> +
> +  /* Determine a bitmap of variables which are still in scope after the
> +     call.  */
> +  bitmap local_live_vars = NULL;
> +  if (live_vars)
> +    local_live_vars = live_vars_at_stmt (live_vars_vec, live_vars, call);
> +
>    /* Make sure the tail invocation of this function does not indirectly
>       refer to local variables.  (Passing variables directly by value
>       is OK.)  */
> @@ -529,9 +558,28 @@ find_tail_calls (basic_block bb, struct
>  	  && may_be_aliased (var)
>  	  && (ref_maybe_used_by_stmt_p (call, var)
>  	      || call_may_clobber_ref_p (call, var)))
> -	return;
> +	{
> +	  if (!VAR_P (var))
> +	    {
> +	      if (local_live_vars)
> +		BITMAP_FREE (local_live_vars);
> +	      return;
> +	    }
> +	  else
> +	    {
> +	      unsigned int *v = live_vars->get (DECL_UID (var));
> +	      if (bitmap_bit_p (local_live_vars, *v))
> +		{
> +		  BITMAP_FREE (local_live_vars);
> +		  return;
> +		}
> +	    }
> +	}
>      }
>  
> +  if (local_live_vars)
> +    BITMAP_FREE (local_live_vars);
> +
>    /* Now check the statements after the call.  None of them has virtual
>       operands, so they may only depend on the call through its return
>       value.  The return value should also be dependent on each of them,
> @@ -1032,6 +1080,13 @@ tree_optimize_tail_calls_1 (bool opt_tai
>  	find_tail_calls (e->src, &tailcalls);
>      }
>  
> +  if (live_vars)
> +    {
> +      destroy_live_vars (live_vars_vec);
> +      delete live_vars;
> +      live_vars = NULL;
> +    }
> +
>    /* Construct the phi nodes and accumulators if necessary.  */
>    a_acc = m_acc = NULL_TREE;
>    for (act = tailcalls; act; act = act->next)
> --- gcc/tree-inline.h.jj	2019-03-28 23:32:44.457399561 +0100
> +++ gcc/tree-inline.h	2019-05-07 16:43:55.371041120 +0200
> @@ -160,6 +160,10 @@ struct copy_body_data
>       when inlining a call within an OpenMP SIMD-on-SIMT loop.  */
>    vec<tree> *dst_simt_vars;
>  
> +  /* Basic block to which clobbers for local variables from the inline
> +     function that need to live in memory should be added.  */
> +  basic_block eh_landing_pad_dest;
> +
>    /* If clobbers for local variables from the inline function
>       that need to live in memory should be added to EH landing pads
>       outside of the inlined function, this should be the number
> --- gcc/tree-inline.c.jj	2019-03-28 23:32:44.630396776 +0100
> +++ gcc/tree-inline.c	2019-05-07 17:43:35.302614747 +0200
> @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.
>  #include "attribs.h"
>  #include "sreal.h"
>  #include "tree-cfgcleanup.h"
> +#include "tree-ssa-live.h"
>  
>  /* I'm not real happy about this, but we need to handle gimple and
>     non-gimple trees.  */
> @@ -2285,12 +2286,15 @@ update_ssa_across_abnormal_edges (basic_
>  }
>  
>  /* Insert clobbers for automatic variables of inlined ID->src_fn
> -   function at the start of basic block BB.  */
> +   function at the start of basic block ID->eh_landing_pad_dest.  */
>  
>  static void
> -add_clobbers_to_eh_landing_pad (basic_block bb, copy_body_data *id)
> +add_clobbers_to_eh_landing_pad (copy_body_data *id)
>  {
>    tree var;
> +  basic_block bb = id->eh_landing_pad_dest;
> +  live_vars_map *vars = NULL;
> +  unsigned int cnt = 0;
>    unsigned int i;
>    FOR_EACH_VEC_SAFE_ELT (id->src_cfun->local_decls, i, var)
>      if (VAR_P (var)
> @@ -2312,12 +2316,47 @@ add_clobbers_to_eh_landing_pad (basic_bl
>  	    && !is_gimple_reg (new_var)
>  	    && auto_var_in_fn_p (new_var, id->dst_fn))
>  	  {
> +	    if (vars == NULL)
> +	      vars = new live_vars_map;
> +            vars->put (DECL_UID (var), cnt++);
> +	  }
> +      }
> +  if (vars == NULL)
> +    return;
> +
> +  vec<bitmap_head> live = compute_live_vars (id->src_cfun, vars);
> +  FOR_EACH_VEC_SAFE_ELT (id->src_cfun->local_decls, i, var)
> +    if (VAR_P (var))
> +      {
> +	edge e;
> +	edge_iterator ei;
> +	bool needed = false;
> +	unsigned int *v = vars->get (DECL_UID (var));
> +	if (v == NULL)
> +	  continue;
> +	FOR_EACH_EDGE (e, ei, bb->preds)
> +	  if ((e->flags & EDGE_EH) != 0
> +	      && e->src->index >= id->add_clobbers_to_eh_landing_pads)
> +	    {
> +	      basic_block src_bb = (basic_block) e->src->aux;
> +
> +	      if (bitmap_bit_p (&live[src_bb->index], *v))
> +		{
> +		  needed = true;
> +		  break;
> +		}
> +	    }
> +	if (needed)
> +	  {
> +	    tree new_var = *id->decl_map->get (var);
>  	    gimple_stmt_iterator gsi = gsi_after_labels (bb);
>  	    tree clobber = build_clobber (TREE_TYPE (new_var));
>  	    gimple *clobber_stmt = gimple_build_assign (new_var, clobber);
>  	    gsi_insert_before (&gsi, clobber_stmt, GSI_NEW_STMT);
>  	  }
>        }
> +  destroy_live_vars (live);
> +  delete vars;
>  }
>  
>  /* Copy edges from BB into its copy constructed earlier, scale profile
> @@ -2452,8 +2491,10 @@ copy_edges_for_bb (basic_block bb, profi
>  		  e->probability = profile_probability::never ();
>  		if (e->dest->index < id->add_clobbers_to_eh_landing_pads)
>  		  {
> -		    add_clobbers_to_eh_landing_pad (e->dest, id);
> -		    id->add_clobbers_to_eh_landing_pads = 0;
> +		    if (id->eh_landing_pad_dest == NULL)
> +		      id->eh_landing_pad_dest = e->dest;
> +		    else
> +		      gcc_assert (id->eh_landing_pad_dest == e->dest);
>  		  }
>  	      }
>          }
> @@ -2893,6 +2934,12 @@ copy_cfg_body (copy_body_data * id,
>        need_debug_cleanup |= copy_edges_for_bb (bb, num, den, exit_block_map,
>  					       abnormal_goto_dest, id);
>  
> +  if (id->eh_landing_pad_dest)
> +    {
> +      add_clobbers_to_eh_landing_pad (id);
> +      id->eh_landing_pad_dest = NULL;
> +    }
> +
>    if (new_entry)
>      {
>        edge e = make_edge (entry_block_map, (basic_block)new_entry->aux,
> --- gcc/testsuite/gcc.dg/tree-ssa/pr89060.c.jj	2019-05-07 16:43:55.373041088 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr89060.c	2019-05-07 17:53:47.085828039 +0200
> @@ -0,0 +1,53 @@
> +/* PR tree-optimization/89060 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-tailc" } */
> +/* { dg-final { scan-tree-dump-not "baz \\\(1\\\); \\\[tail call\\\]" "tailc" } } */
> +/* { dg-final { scan-tree-dump "baz \\\(2\\\); \\\[tail call\\\]" "tailc" } } */
> +/* { dg-final { scan-tree-dump "baz \\\(3\\\); \\\[tail call\\\]" "tailc" } } */
> +/* { dg-final { scan-tree-dump "baz \\\(4\\\); \\\[tail call\\\]" "tailc" } } */
> +/* { dg-final { scan-tree-dump-not "baz \\\(5\\\); \\\[tail call\\\]" "tailc" } } */
> +
> +void qux (char *, int n);
> +int baz (int);
> +
> +int
> +foo (int n)
> +{
> +  char buf[64];
> +  qux (buf, n);
> +  return baz (1);
> +}
> +
> +int
> +bar (int n)
> +{
> +  {
> +    char buf[64];
> +    qux (buf, n);
> +  }
> +  return baz (2);
> +}
> +
> +int
> +quux (int n)
> +{
> +  if (n < 10)
> +    {
> +      {
> +        char buf[64];
> +        qux (buf, n);
> +      }
> +      return baz (3);
> +    }
> +  if (n > 20)
> +    {
> +      {
> +        char buf2[64];
> +	qux (buf2, n + 1);
> +      }
> +      return baz (4);
> +    }
> +  char buf3[64];
> +  qux (buf3, n + 2);
> +  return baz (5);
> +}
> 
> 
> 	Jakub
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG NÌrnberg)


More information about the Gcc-patches mailing list