This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch] Improving jump-thread pass for PR 54742


Hi Jeff,

I have adapted the code generation part from James' patch to current trunk, and
the resulting patch gets the 30% speedup on coremark and passes bootstrap of GCC.

Ok for trunk?

Thanks,
Sebastian

Sebastian Pop wrote:
> Sebastian Pop wrote:
> > Jeff Law wrote:
> > > On 08/21/14 04:30, Richard Biener wrote:
> > > >>It turns Jeff's jump-threading code in to a strange franken-pass of bits and
> > > >>pieces of detection and optimisation, and would need some substantial
> > > >>reworking to fit in with Jeff's changes last Autumn, but if it is more
> > > >>likely to be acceptable for trunk then perhaps we could look to revive it.
> > > >>It would be nice to reuse the path copy code Jeff added last year, but I
> > > >>don't have much intuition as to how feasible that is.
> > > >>
> > > >>Was this the sort of thing that you were imagining?
> > > >
> > > >Yeah, didn't look too closely though.
> > > It'd be pretty ugly I suspect.  But it's probably worth pondering
> > > since that approach would eliminate the concerns about the cost of
> > > detection (which is problematical for the jump threader) by using
> > > Steve's code for that.
> > > 
> > > On the update side, I suspect most, if not all of the framework is
> > > in place to handle this kind of update if the right threading paths
> > > were passed to the updater.  I can probably cobble together that
> > > by-hand and see what the tree-ssa-threadupdate does with it.  But
> > > it'll be a week or so before I could look at it.
> > 
> > I adapted the patch James has sent last year to use the new update paths
> 
> Attached an updated version of the patch.
> 
> > mechanism.  I verified that the attached patch does register all the paths that
> > need to be threaded.  Thread updater seems to have some problems handling the
> > attached testcase (a simplified version of the testcase attached to the bug.)
> > 
> > Jeff, could you please have a look at why the jump-thread updater is crashing?
> 
> I have tried to understand why the code generation part ICEs on coremark: the
> first problem that I have seen is that tree-ssa-threadupdate.c does not handle
> more than a joiner block per path to be threaded, so we would not be able to
> jump thread accross the joiners of the if condition and the joiner of the switch
> condition: i.e., these paths
> 
> patch:   Registering jump thread: (7, 10) incoming edge;  (10, 25) joiner;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy;
> patch:   Registering jump thread: (28, 10) incoming edge;  (10, 25) joiner;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 11) nocopy;
> patch:   Registering jump thread: (8, 10) incoming edge;  (10, 25) joiner;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
> patch:   Registering jump thread: (9, 10) incoming edge;  (10, 25) joiner;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
> 
> Another problem is that we attach the path to be threaded to the ->aux field of
> the first edge in the path, such that we would have to cancel some of the paths
> because we cannot keep track of all the paths to be threaded.
> 
> For coremark, we would discover some jump-thread paths from one of the switch
> cases over the loop exit condition, either to bb_27 outside the loop, or to bb_4
> staying inside the loop.  Then with the "patch:" we would discover jump threads
> that would thread switch cases to switch cases, and because these paths start
> with the same edges for which we have already assigned a path to e->aux, we
> would have to cancel the interesting threads added by the patch:
> 
>   Registering jump thread: (12, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
>   Registering jump thread: (13, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
>   Registering jump thread: (29, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
>   Registering jump thread: (31, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
>   Registering jump thread: (16, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
>   Registering jump thread: (15, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
>   Registering jump thread: (32, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
>   Registering jump thread: (19, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
>   Registering jump thread: (18, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
>   Registering jump thread: (22, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
>   Registering jump thread: (21, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
>   Registering jump thread: (34, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
>   Registering jump thread: (33, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
>   Registering jump thread: (35, 25) incoming edge;  (25, 26) joiner;  (26, 27) nocopy;
>   Registering jump thread: (24, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy;
> patch:   Registering jump thread: (12, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
> patch:   Registering jump thread: (16, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy;
> patch:   Registering jump thread: (19, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
> patch:   Registering jump thread: (22, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
> patch:   Registering jump thread: (34, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
> patch:   Registering jump thread: (35, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
> patch:   Registering jump thread: (29, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy;
> patch:   Registering jump thread: (13, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
> patch:   Registering jump thread: (15, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
> patch:   Registering jump thread: (31, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
> patch:   Registering jump thread: (18, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 20) nocopy;
> patch:   Registering jump thread: (32, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
> patch:   Registering jump thread: (21, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 23) nocopy;
> patch:   Registering jump thread: (33, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy;
> patch:   Registering jump thread: (24, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy;
>   Registering jump thread: (6, 36) incoming edge;  (36, 7) normal;
>   Cancelling jump thread: (12, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
>   Cancelling jump thread: (16, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy;
>   Cancelling jump thread: (19, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
>   Cancelling jump thread: (22, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
>   Cancelling jump thread: (34, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
>   Cancelling jump thread: (35, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
>   Cancelling jump thread: (29, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 14) nocopy;
>   Cancelling jump thread: (13, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
>   Cancelling jump thread: (15, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 17) nocopy;
>   Cancelling jump thread: (31, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
>   Cancelling jump thread: (18, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 20) nocopy;
>   Cancelling jump thread: (32, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 25) nocopy;
>   Cancelling jump thread: (21, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 23) nocopy;
>   Cancelling jump thread: (33, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy;
>   Cancelling jump thread: (24, 25) incoming edge;  (25, 26) joiner;  (26, 4) nocopy; (4, 37) nocopy; (37, 36) nocopy; (36, 24) nocopy;
> 
> Here is the structure of the CFG with the loops:
> 
> (gdb) p debug_loops (2)
> loop_0 (header = 0, latch = 1, niter = )
> {
>   bb_2 (preds = {bb_0 }, succs = {bb_3 bb_27 })
>   bb_3 (preds = {bb_2 }, succs = {bb_5 bb_6 })
>   bb_5 (preds = {bb_4 bb_3 }, succs = {bb_27 })
>   bb_6 (preds = {bb_3 }, succs = {bb_36 })
>   bb_27 (preds = {bb_5 bb_25 bb_26 bb_2 }, succs = {bb_1 })
>   loop_1 (header = 36, latch = 37, niter = )
>   {
>     bb_4 (preds = {bb_26 }, succs = {bb_5 bb_37 })
>     bb_37 (preds = {bb_4 }, succs = {bb_36 })
>     bb_36 (preds = {bb_6 bb_37 }, succs = {bb_25 bb_7 bb_11 bb_20 bb_14 bb_17 bb_23 bb_24 })
>     bb_7 (preds = {bb_36 }, succs = {bb_10 bb_28 })
>     bb_8 (preds = {bb_28 }, succs = {bb_10 bb_9 })
>     bb_9 (preds = {bb_8 }, succs = {bb_10 })
>     bb_10 (preds = {bb_7 bb_28 bb_8 bb_9 }, succs = {bb_25 })
>     bb_11 (preds = {bb_36 }, succs = {bb_29 bb_30 })
>     bb_12 (preds = {bb_30 }, succs = {bb_25 })
>     bb_13 (preds = {bb_30 }, succs = {bb_25 })
>     bb_14 (preds = {bb_36 }, succs = {bb_15 bb_16 })
>     bb_15 (preds = {bb_14 }, succs = {bb_25 })
>     bb_16 (preds = {bb_14 }, succs = {bb_25 bb_31 })
>     bb_17 (preds = {bb_36 }, succs = {bb_18 bb_19 })
>     bb_18 (preds = {bb_17 }, succs = {bb_25 })
>     bb_19 (preds = {bb_17 }, succs = {bb_25 bb_32 })
>     bb_20 (preds = {bb_36 }, succs = {bb_21 bb_22 })
>     bb_21 (preds = {bb_20 }, succs = {bb_25 })
>     bb_22 (preds = {bb_20 }, succs = {bb_25 })
>     bb_23 (preds = {bb_36 }, succs = {bb_33 bb_34 })
>     bb_24 (preds = {bb_36 }, succs = {bb_25 bb_35 })
>     bb_25 (preds = {bb_10 bb_12 bb_16 bb_19 bb_22 bb_34 bb_35 bb_36 bb_29 bb_13 bb_15 bb_31 bb_18 bb_32 bb_21 bb_33 bb_24 }, succs = {bb_26 bb_27 })
>     bb_26 (preds = {bb_25 }, succs = {bb_4 bb_27 })
>     bb_28 (preds = {bb_7 }, succs = {bb_10 bb_8 })
>     bb_29 (preds = {bb_11 }, succs = {bb_25 })
>     bb_30 (preds = {bb_11 }, succs = {bb_12 bb_13 })
>     bb_31 (preds = {bb_16 }, succs = {bb_25 })
>     bb_32 (preds = {bb_19 }, succs = {bb_25 })
>     bb_33 (preds = {bb_23 }, succs = {bb_25 })
>     bb_34 (preds = {bb_23 }, succs = {bb_25 })
>     bb_35 (preds = {bb_24 }, succs = {bb_25 })
>   }
> }
> 
> What about removing the use of e->aux in threadupdate.c, to be able to jump
> thread across all the recorded paths?
> 
> Thanks,
> Sebastian

> From bac0f2a390048652910f77503b21b3e208daeae1 Mon Sep 17 00:00:00 2001
> From: Sebastian Pop <s.pop@samsung.com>
> Date: Fri, 26 Sep 2014 14:54:20 -0500
> Subject: [PATCH] jump thread for PR 54742
> 
> Adapted from a patch from James Greenhalgh.
> 
> 	* tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the
> 	original value of cond when simplification fails.
> 	(find_thread_path): New.
> 	(find_control_statement_thread_paths): New.
> 	(thread_through_normal_block): Call find_control_statement_thread_paths.
> 
> 	* testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c |  32 ++++
>  gcc/tree-ssa-threadedge.c                        | 180 ++++++++++++++++++++++-
>  gcc/tree-ssa-threadupdate.c                      |   4 +
>  3 files changed, 215 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
> new file mode 100644
> index 0000000..f3ef725
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
> @@ -0,0 +1,32 @@
> +int sum0, sum1, sum2, sum3;
> +int foo(char * s, char** ret)
> +{
> +  int state=0;
> +  char c;
> +
> +  for (; *s && state != 4; s++)
> +    {
> +      c = *s;
> +      if (c == '*')
> +	{
> +	  s++;
> +	  break;
> +	}
> +      switch (state) {
> +	case 0:
> +	  if (c == '+') state = 1;
> +	  else if (c != '-') sum0+=c;
> +	  break;
> +	case 1:
> +	  if (c == '+') state = 2;
> +	  else if (c == '-') state = 0;
> +	  else sum1+=c;
> +	  break;
> +	default:
> +	  break;
> +      }
> +
> +    }
> +  *ret = s;
> +  return state;
> +}
> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
> index 3dee5ba..7b9e5b6 100644
> --- a/gcc/tree-ssa-threadedge.c
> +++ b/gcc/tree-ssa-threadedge.c
> @@ -628,6 +628,7 @@ simplify_control_stmt_condition (edge e,
>       rather than use a relational operator.  These are simpler to handle.  */
>    if (TREE_CODE (cond) == SSA_NAME)
>      {
> +      tree original_lhs = cond;
>        cached_lhs = cond;
>  
>        /* Get the variable's current value from the equivalence chains.
> @@ -656,6 +657,12 @@ simplify_control_stmt_condition (edge e,
>  	 pass specific callback to try and simplify it further.  */
>        if (cached_lhs && ! is_gimple_min_invariant (cached_lhs))
>          cached_lhs = (*simplify) (stmt, stmt);
> +
> +      /* We couldn't find an invariant.  But, callers of this
> +	 function may be able to do something useful with the
> +	 unmodified destination.  */
> +      if (!cached_lhs)
> +	cached_lhs = original_lhs;
>      }
>    else
>      cached_lhs = NULL;
> @@ -915,6 +922,155 @@ thread_around_empty_blocks (edge taken_edge,
>    return false;
>  }
>  
> +/* Return true if there is at least one path from START_BB to END_BB.
> +   VISITED_BBS is used to make sure we don't fall into an infinite loop.  */
> +
> +static bool
> +find_thread_path (basic_block start_bb, basic_block end_bb,
> +		    vec<basic_block, va_gc> *&path,
> +		    hash_set<basic_block> *visited_bbs)
> +{
> +  if (start_bb == end_bb)
> +    {
> +      vec_safe_push (path, start_bb);
> +      return true;
> +    }
> +
> +  if (!visited_bbs->add(start_bb))
> +    {
> +      edge e;
> +      edge_iterator ei;
> +      FOR_EACH_EDGE (e, ei, start_bb->succs)
> +	if (find_thread_path (e->dest, end_bb, path, visited_bbs))
> +	  {
> +	    vec_safe_push (path, start_bb);
> +	    return true;
> +	  }
> +    }
> +
> +  return false;
> +}
> +
> +/* We trace the value of the variable EXPR back through any phi nodes looking
> +   for places where it gets a constant value and save the path.  */
> +
> +static void
> +find_control_statement_thread_paths (tree expr,
> +				     hash_set<gimple> *visited_phis,
> +				     vec<basic_block, va_gc> *&path)
> +{
> +  tree var = SSA_NAME_VAR (expr);
> +  gimple def_stmt = SSA_NAME_DEF_STMT (expr);
> +  basic_block var_bb = gimple_bb (def_stmt);
> +
> +  if (var == NULL || var_bb == NULL)
> +    return;
> +
> +  vec<basic_block, va_gc> *next_path;
> +  vec_alloc (next_path, n_basic_blocks_for_fn (cfun));
> +
> +  basic_block last_bb_in_path = path->last ();
> +
> +  /* Put the path from var_bb to last_bb_in_path into next_path.  */
> +  if (var_bb != last_bb_in_path)
> +    {
> +      edge e;
> +      int e_count = 0;
> +      edge_iterator ei;
> +
> +      FOR_EACH_EDGE (e, ei, last_bb_in_path->preds)
> +	{
> +	  hash_set<basic_block> *visited_bbs = new hash_set<basic_block>;
> +
> +	  if (find_thread_path (var_bb, e->src, next_path, visited_bbs))
> +	    e_count = e_count + 1;
> +
> +	  delete visited_bbs;
> +
> +	  /* If there is more than one path, stop.  */
> +	  if (e_count > 1)
> +	    {
> +	      vec_free (next_path);
> +	      return;
> +	    }
> +	}
> +    }
> +
> +  /* Visit PHI nodes once.  */
> +  if (gimple_code (def_stmt) != GIMPLE_PHI
> +      || visited_phis->add(def_stmt)) {
> +    vec_free (next_path);
> +    return;
> +  }
> +
> +  /* Append all the nodes from next_path to path.  */
> +  vec_safe_splice (path, next_path);
> +  gcc_assert (path->last () == var_bb);
> +
> +  /* Iterate over the arguments of PHI.  */
> +  unsigned int i;
> +  for (i = 0; i < gimple_phi_num_args (def_stmt); i++)
> +    {
> +      tree arg = gimple_phi_arg_def (def_stmt, i);
> +      basic_block bbi = gimple_phi_arg_edge (def_stmt, i)->src;
> +
> +      /* Skip edges pointing outside the current loop.  */
> +      if (!arg || var_bb->loop_father != bbi->loop_father)
> +	continue;
> +
> +      /* Add BBI to the path.  */
> +      vec_safe_push (path, bbi);
> +
> +      if (TREE_CODE (arg) == INTEGER_CST)
> +	{
> +	  int j, n = path->length ();
> +	  vec<jump_thread_edge *> *jump_thread_path
> +	    = new vec<jump_thread_edge *> ();
> +	  int joiners = 0;
> +
> +	  for (j = 0; j < n - 1; j++)
> +	    {
> +	      edge e = find_edge ((*path)[n - j - 1],
> +				  (*path)[n - j - 2]);
> +	      gcc_assert (e);
> +	      enum jump_thread_edge_type kind;
> +
> +	      if (j == 0)
> +		kind = EDGE_START_JUMP_THREAD;
> +	      else if (single_pred_p (e->src))
> +		kind = EDGE_NO_COPY_SRC_BLOCK;
> +	      else {
> +		kind = EDGE_COPY_SRC_JOINER_BLOCK;
> +		++joiners;
> +	      }
> +
> +	      jump_thread_edge *x = new jump_thread_edge (e, kind);
> +	      jump_thread_path->safe_push (x);
> +	    }
> +
> +	  /* Add the edge taken when the control variable has value ARG.  */
> +	  edge taken_edge = find_taken_edge ((*path)[0], arg);
> +	  jump_thread_edge *x
> +	    = new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK);
> +	  jump_thread_path->safe_push (x);
> +
> +	  /* Thread-update does not handle more than two joiners.  A path with
> +	     less than 3 nodes should not be jump-threaded.  */
> +	  if (joiners < 2 && n > 2)
> +	    register_jump_thread (jump_thread_path);
> +	}
> +      else if (TREE_CODE (arg) == SSA_NAME)
> +	find_control_statement_thread_paths (arg, visited_phis, path);
> +
> +      /* Remove BBI from the path.  */
> +      path->pop ();
> +    }
> +
> +  /* Remove all the nodes that we added from next_path.  */
> +  vec_safe_truncate (path, (path->length () - next_path->length ()));
> +  vec_free (next_path);
> +}
> +
>  /* We are exiting E->src, see if E->dest ends with a conditional
>     jump which has a known value when reached via E.
>  
> @@ -1000,7 +1156,10 @@ thread_through_normal_block (edge e,
>        cond = simplify_control_stmt_condition (e, stmt, dummy_cond, simplify,
>  					      handle_dominating_asserts);
>  
> -      if (cond && is_gimple_min_invariant (cond))
> +      if (!cond)
> +	return 0;
> +
> +      if (is_gimple_min_invariant (cond))
>  	{
>  	  edge taken_edge = find_taken_edge (e->dest, cond);
>  	  basic_block dest = (taken_edge ? taken_edge->dest : NULL);
> @@ -1046,7 +1205,25 @@ thread_through_normal_block (edge e,
>  				      backedge_seen_p);
>  	  return 1;
>  	}
> +
> +      if (TREE_CODE (cond) != SSA_NAME
> +	  || e->dest->loop_father != e->src->loop_father)
> +	return 0;
> +
> +      /* When COND cannot be simplified, try to find paths from a control
> +	 statement back through the PHI nodes which would affect that control
> +	 statement.  */
> +      vec<basic_block, va_gc> *bb_path;
> +      vec_alloc (bb_path, n_basic_blocks_for_fn (cfun));
> +      vec_safe_push (bb_path, e->dest);
> +      hash_set<gimple> *visited_phis = new hash_set<gimple>;
> +
> +      find_control_statement_thread_paths (cond, visited_phis, bb_path);
> +
> +      delete visited_phis;
> +      vec_free (bb_path);
> +
> +      return -1;
>      }
>    return 0;
>  }
> -- 
> 2.1.0.243.g30d45f7
> 

>From 23b1ac8fa92e9e4cd05edb2967aba564126f75a1 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <s.pop@samsung.com>
Date: Fri, 26 Sep 2014 14:54:20 -0500
Subject: [PATCH] jump thread for PR 54742

Adapted from a patch from James Greenhalgh.

	* params.def (max-thread-path-insns, max-thread-length,
	max-thread-paths): New.

	* testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New.

	* tree-cfg.c (gimple_duplicate_sese_region): Save and restore loop
	header and latch.

	* tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the
	original value of cond when simplification fails.
	(fsm_find_thread_path): New.
	(fsm_find_control_statement_thread_paths): New.
	(fsm_thread_through_normal_block): Call find_control_statement_thread_paths.

	* tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print
	EDGE_START_FSM_THREAD.
	(thread_through_all_blocks): Generate code for EDGE_START_FSM_THREAD edges
	calling gimple_duplicate_sese_region.

	* tree-ssa-threadupdate.h (jump_thread_edge_type): Add EDGE_START_FSM_THREAD.
---
 gcc/params.def                                   |  19 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c |  32 ++++
 gcc/tree-cfg.c                                   |  21 ++-
 gcc/tree-ssa-threadedge.c                        | 200 ++++++++++++++++++++++-
 gcc/tree-ssa-threadupdate.c                      |  52 +++++-
 gcc/tree-ssa-threadupdate.h                      |   1 +
 6 files changed, 320 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c

diff --git a/gcc/params.def b/gcc/params.def
index d2d2add..749f962 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -123,6 +123,25 @@ DEFPARAM (PARAM_PARTIAL_INLINING_ENTRY_PROBABILITY,
 	  "Maximum probability of the entry BB of split region (in percent relative to entry BB of the function) to make partial inlining happen",
 	  70, 0, 0)
 
+/* Maximum number of instructions to copy when duplicating blocks
+   on a jump thread path.  */
+DEFPARAM (PARAM_MAX_THREAD_PATH_INSNS,
+	  "max-thread-path-insns",
+	  "Maximum number of instructions to copy when duplicating blocks on a jump thread path",
+	  100, 1, 999999)
+
+/* Maximum length of a jump thread path.  */
+DEFPARAM (PARAM_MAX_THREAD_LENGTH,
+	  "max-thread-length",
+	  "Maximum number of basic blocks on a jump thread path",
+	  10, 1, 999999)
+
+/* Maximum number of jump thread paths to duplicate.  */
+DEFPARAM (PARAM_MAX_THREAD_PATHS,
+	  "max-thread-paths",
+	  "Maximum number of new jump thread paths to create",
+	  50, 1, 999999)
+
 /* Limit the number of expansions created by the variable expansion
    optimization to avoid register pressure.  */
 DEFPARAM (PARAM_MAX_VARIABLE_EXPANSIONS,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
new file mode 100644
index 0000000..f3ef725
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -0,0 +1,32 @@
+int sum0, sum1, sum2, sum3;
+int foo(char * s, char** ret)
+{
+  int state=0;
+  char c;
+
+  for (; *s && state != 4; s++)
+    {
+      c = *s;
+      if (c == '*')
+	{
+	  s++;
+	  break;
+	}
+      switch (state) {
+	case 0:
+	  if (c == '+') state = 1;
+	  else if (c != '-') sum0+=c;
+	  break;
+	case 1:
+	  if (c == '+') state = 2;
+	  else if (c == '-') state = 0;
+	  else sum1+=c;
+	  break;
+	default:
+	  break;
+      }
+
+    }
+  *ret = s;
+  return state;
+}
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index ee10bc6..565cfe3 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -5949,10 +5949,12 @@ gimple_duplicate_sese_region (edge entry, edge exit,
 {
   unsigned i;
   bool free_region_copy = false, copying_header = false;
+  bool save_loop_details = false;
   struct loop *loop = entry->dest->loop_father;
   edge exit_copy;
   vec<basic_block> doms;
   edge redirected;
+  int memo_loop_header_no = 0, memo_loop_latch_no = 0;
   int total_freq = 0, entry_freq = 0;
   gcov_type total_count = 0, entry_count = 0;
 
@@ -5970,9 +5972,15 @@ gimple_duplicate_sese_region (edge entry, edge exit,
       if (region[i]->loop_father != loop)
 	return false;
 
-      if (region[i] != entry->dest
-	  && region[i] == loop->header)
-	return false;
+      /* If we are copying an abnormally shaped region, keep track of
+	 which block will become our loop header.  */
+      if ((region[i] != entry->dest && region[i] == loop->header)
+	  || (region[i] != entry->src && region[i] == loop->latch))
+	{
+	  save_loop_details = true;
+	  memo_loop_latch_no = i;
+	  memo_loop_header_no = i;
+	}
     }
 
   /* In case the function is used for loop header copying (which is the primary
@@ -6055,6 +6063,13 @@ gimple_duplicate_sese_region (edge entry, edge exit,
       loop->latch = exit->src;
     }
 
+  /* Restore loop details if we were asked to save them.  */
+  if (save_loop_details)
+    {
+      loop->header = region_copy[memo_loop_header_no];
+      loop->latch = region_copy[memo_loop_latch_no];
+    }
+
   /* Redirect the entry and add the phi node arguments.  */
   redirected = redirect_edge_and_branch (entry, get_bb_copy (entry->dest));
   gcc_assert (redirected != NULL);
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index d5b9696..de9b3fe 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -56,6 +56,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "tree-ssa-threadedge.h"
 #include "builtins.h"
+#include "cfg.h"
+#include "cfganal.h"
 
 /* To avoid code explosion due to jump threading, we limit the
    number of statements we are going to copy.  This variable
@@ -660,6 +662,7 @@ simplify_control_stmt_condition (edge e,
      rather than use a relational operator.  These are simpler to handle.  */
   if (TREE_CODE (cond) == SSA_NAME)
     {
+      tree original_lhs = cond;
       cached_lhs = cond;
 
       /* Get the variable's current value from the equivalence chains.
@@ -688,6 +691,12 @@ simplify_control_stmt_condition (edge e,
 	 pass specific callback to try and simplify it further.  */
       if (cached_lhs && ! is_gimple_min_invariant (cached_lhs))
         cached_lhs = (*simplify) (stmt, stmt);
+
+      /* We couldn't find an invariant.  But, callers of this
+	 function may be able to do something useful with the
+	 unmodified destination.  */
+      if (!cached_lhs)
+	cached_lhs = original_lhs;
     }
   else
     cached_lhs = NULL;
@@ -947,6 +956,172 @@ thread_around_empty_blocks (edge taken_edge,
   return false;
 }
 
+/* Return true if there is at least one path from START_BB to END_BB.
+   VISITED_BBS is used to make sure we don't fall into an infinite loop.  */
+
+static bool
+fsm_find_thread_path (basic_block start_bb, basic_block end_bb,
+		      vec<basic_block, va_gc> *&path,
+		      hash_set<basic_block> *visited_bbs, int n_insns)
+{
+  if (start_bb == end_bb)
+    {
+      vec_safe_push (path, start_bb);
+      return true;
+    }
+
+  if (!visited_bbs->add(start_bb))
+    {
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, start_bb->succs)
+	if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, n_insns))
+	  {
+	    vec_safe_push (path, start_bb);
+	    return true;
+	  }
+    }
+
+  return false;
+}
+
+static int max_threaded_paths;
+
+/* We trace the value of the variable EXPR back through any phi nodes looking
+   for places where it gets a constant value and save the path.  Stop after
+   having recorded MAX_PATHS jump threading paths.  */
+
+static void
+fsm_find_control_statement_thread_paths (tree expr,
+					 hash_set<gimple> *visited_phis,
+					 vec<basic_block, va_gc> *&path)
+{
+  tree var = SSA_NAME_VAR (expr);
+  gimple def_stmt = SSA_NAME_DEF_STMT (expr);
+  basic_block var_bb = gimple_bb (def_stmt);
+
+  if (var == NULL || var_bb == NULL)
+    return;
+
+  vec<basic_block, va_gc> *next_path;
+  vec_alloc (next_path, n_basic_blocks_for_fn (cfun));
+
+  basic_block last_bb_in_path = path->last ();
+
+  /* Put the path from var_bb to last_bb_in_path into next_path.  */
+  if (var_bb != last_bb_in_path)
+    {
+      edge e;
+      int e_count = 0;
+      edge_iterator ei;
+
+      FOR_EACH_EDGE (e, ei, last_bb_in_path->preds)
+	{
+	  hash_set<basic_block> *visited_bbs = new hash_set<basic_block>;
+
+	  if (fsm_find_thread_path (var_bb, e->src, next_path, visited_bbs, 0))
+	    ++e_count;
+
+	  delete visited_bbs;
+
+	  /* If there is more than one path, stop.  */
+	  if (e_count > 1)
+	    {
+	      vec_free (next_path);
+	      return;
+	    }
+	}
+    }
+
+  /* Visit PHI nodes once.  */
+  if (gimple_code (def_stmt) != GIMPLE_PHI
+      || visited_phis->add(def_stmt))
+    {
+      vec_free (next_path);
+      return;
+    }
+
+  /* Append all the nodes from next_path to path.  */
+  vec_safe_splice (path, next_path);
+  gcc_assert (path->last () == var_bb);
+
+  /* Iterate over the arguments of PHI.  */
+  unsigned int i;
+  for (i = 0; i < gimple_phi_num_args (def_stmt); i++)
+    {
+      tree arg = gimple_phi_arg_def (def_stmt, i);
+      basic_block bbi = gimple_phi_arg_edge (def_stmt, i)->src;
+
+      /* Skip edges pointing outside the current loop.  */
+      if (!arg || var_bb->loop_father != bbi->loop_father)
+	continue;
+
+      /* Add BBI to the path.  */
+      vec_safe_push (path, bbi);
+
+      if (TREE_CODE (arg) == INTEGER_CST)
+	{
+	  int j, n = path->length ();
+	  vec<jump_thread_edge *> *jump_thread_path
+	    = new vec<jump_thread_edge *> ();
+	  int joiners = 0;
+
+	  for (j = 0; j < n - 1; j++)
+	    {
+	      edge e = find_edge ((*path)[n - j - 1],
+				  (*path)[n - j - 2]);
+	      gcc_assert (e);
+	      enum jump_thread_edge_type kind;
+
+	      if (j == 0)
+		kind = EDGE_START_FSM_THREAD;
+	      else if (single_pred_p (e->src))
+		kind = EDGE_NO_COPY_SRC_BLOCK;
+	      else {
+		kind = EDGE_COPY_SRC_JOINER_BLOCK;
+		++joiners;
+	      }
+
+	      jump_thread_edge *x = new jump_thread_edge (e, kind);
+	      jump_thread_path->safe_push (x);
+	    }
+
+	  /* Add the edge taken when the control variable has value ARG.  */
+	  edge taken_edge = find_taken_edge ((*path)[0], arg);
+	  jump_thread_edge *x
+	    = new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK);
+	  jump_thread_path->safe_push (x);
+
+	  /* A path with less than 3 nodes should not be jump-threaded.  */
+	  if (n > 2 && n < PARAM_VALUE (PARAM_MAX_THREAD_LENGTH)
+	      && max_threaded_paths > 0)
+	    {
+	      int n_insns = 0;
+	      gimple_stmt_iterator gsi;
+
+	      for (j = 1; j < n - 1; j++)
+		for (gsi = gsi_start_bb ((*path)[j]); !gsi_end_p (gsi); gsi_next (&gsi))
+		  ++n_insns;
+
+	      if (n_insns < PARAM_VALUE (PARAM_MAX_THREAD_PATH_INSNS))
+		{
+		  register_jump_thread (jump_thread_path);
+		  --max_threaded_paths;
+		}
+	    }
+	}
+      else if (TREE_CODE (arg) == SSA_NAME)
+	fsm_find_control_statement_thread_paths (arg, visited_phis, path);
+
+      /* Remove BBI from the path.  */
+      path->pop ();
+    }
+
+  /* Remove all the nodes that we added from next_path.  */
+  vec_safe_truncate (path, (path->length () - next_path->length ()));
+  vec_free (next_path);
+}
+
 /* We are exiting E->src, see if E->dest ends with a conditional
    jump which has a known value when reached via E.
 
@@ -1032,7 +1207,10 @@ thread_through_normal_block (edge e,
       cond = simplify_control_stmt_condition (e, stmt, dummy_cond, simplify,
 					      handle_dominating_asserts);
 
-      if (cond && is_gimple_min_invariant (cond))
+      if (!cond)
+	return 0;
+
+      if (is_gimple_min_invariant (cond))
 	{
 	  edge taken_edge = find_taken_edge (e->dest, cond);
 	  basic_block dest = (taken_edge ? taken_edge->dest : NULL);
@@ -1078,6 +1256,26 @@ thread_through_normal_block (edge e,
 				      backedge_seen_p);
 	  return 1;
 	}
+
+      if (TREE_CODE (cond) != SSA_NAME
+	  || e->dest->loop_father != e->src->loop_father)
+	return 0;
+
+      /* When COND cannot be simplified, try to find paths from a control
+	 statement back through the PHI nodes which would affect that control
+	 statement.  */
+      vec<basic_block, va_gc> *bb_path;
+      vec_alloc (bb_path, n_basic_blocks_for_fn (cfun));
+      vec_safe_push (bb_path, e->dest);
+      hash_set<gimple> *visited_phis = new hash_set<gimple>;
+
+      max_threaded_paths = PARAM_VALUE (PARAM_MAX_THREAD_PATHS);
+      fsm_find_control_statement_thread_paths (cond, visited_phis, bb_path);
+
+      delete visited_phis;
+      vec_free (bb_path);
+
+      return -1;
     }
   return 0;
 }
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 151ed83..5847078 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -167,8 +167,9 @@ dump_jump_thread_path (FILE *dump_file, vec<jump_thread_edge *> path,
 		       bool registering)
 {
   fprintf (dump_file,
-	   "  %s jump thread: (%d, %d) incoming edge; ",
+	   "  %s%s jump thread: (%d, %d) incoming edge; ",
 	   (registering ? "Registering" : "Cancelling"),
+	   (path[0]->type == EDGE_START_FSM_THREAD ? " FSM": ""),
 	   path[0]->e->src->index, path[0]->e->dest->index);
 
   for (unsigned int i = 1; i < path.length (); i++)
@@ -2343,6 +2344,55 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   threaded_blocks = BITMAP_ALLOC (NULL);
   memset (&thread_stats, 0, sizeof (thread_stats));
 
+  for (i = 0; i < paths.length (); )
+    {
+      vec<jump_thread_edge *> *path = paths[i];
+      edge entry = (*path)[0]->e;
+
+      if ((*path)[0]->type != EDGE_START_FSM_THREAD
+	  /* Do not jump-thread twice from the same block.  */
+	  || bitmap_bit_p (threaded_blocks, entry->src->index)) {
+	i++;
+	continue;
+      }
+
+      unsigned len = path->length ();
+      edge exit = (*path)[len - 1]->e;
+      basic_block *region = XNEWVEC (basic_block, len - 1);
+
+      for (unsigned int j = 0; j < len - 1; j++)
+	region[j] = (*path)[j]->e->dest;
+
+      bool success = gimple_duplicate_sese_region (entry, exit, region,
+						   len - 1, NULL, 0);
+      delete_jump_thread_path (path);
+      paths.unordered_remove (i);
+
+      if (success) {
+	/* We do not update dominance info.  */
+	free_dominance_info (CDI_DOMINATORS);
+
+	bitmap_set_bit (threaded_blocks, entry->src->index);
+      }
+    }
+
+  for (i = 0; i < paths.length (); )
+    {
+      vec<jump_thread_edge *> *path = paths[i];
+      edge entry = (*path)[0]->e;
+
+      /* Do not jump-thread twice from the same block.  */
+      if (bitmap_bit_p (threaded_blocks, entry->src->index))
+	{
+	  delete_jump_thread_path (path);
+	  paths.unordered_remove (i);
+	}
+      else
+	i++;
+    }
+
+  bitmap_clear (threaded_blocks);
+
   mark_threaded_blocks (threaded_blocks);
 
   initialize_original_copy_tables ();
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 426aca5..42c3a9e 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -26,6 +26,7 @@ extern bool thread_through_all_blocks (bool);
 enum jump_thread_edge_type
 {
   EDGE_START_JUMP_THREAD,
+  EDGE_START_FSM_THREAD,
   EDGE_COPY_SRC_BLOCK,
   EDGE_COPY_SRC_JOINER_BLOCK,
   EDGE_NO_COPY_SRC_BLOCK
-- 
2.1.0.243.g30d45f7


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]