[PR81165] discount killed stmts when sizing blocks for threading

Jeff Law law@redhat.com
Fri Dec 15 18:29:00 GMT 2017


On 12/11/2017 10:17 PM, Alexandre Oliva wrote:
> On Dec 11, 2017, Jeff Law <law@redhat.com> wrote:
> 
> I've updated it according to richi's and your feedbacks.  Regstrapped on
> {x86_64,i686}-linux-gnu.  Ok to install?
> 
> 
> We limit the amount of copying for jump threading based on counting
> stmts.  This counting is overly pessimistic, because we will very
> often delete stmts as a consequence of jump threading: when the final
> conditional jump of a block is removed, earlier SSA names computed
> exclusively for use in that conditional are killed.  Furthermore, PHI
> nodes in blocks with only two predecessors are trivially replaced with
> their now-single values after threading.
> 
> This patch scans blocks to be copied in the path constructed so far
> and estimates the number of stmts that will be removed in the copies,
> bumping up the stmt count limit.
> 
> for  gcc/ChangeLog
> 
> 	PR tree-optimization/81165
> 	* tree-ssa-threadedge.c (uses_in_bb): New.
> 	(estimate_threading_killed_stmts): New.
> 	(estimate_threading_killed_stmts): New overload.
> 	(record_temporary_equivalences_from_stmts_at_dest): Add path
> 	parameter; adjust caller.  Expand limit when it's hit.
> 
> for  gcc/testsuite/ChangeLog
> 
> 	PR tree-optimization/81165
> 	* gcc.dg/pr81165.c: New.


> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
> index 91793bfa59d3..0f5b943aa9a0 100644
> --- a/gcc/tree-ssa-threadedge.c
> +++ b/gcc/tree-ssa-threadedge.c
> @@ -170,6 +170,160 @@ threadedge_valueize (tree t)
>    return t;
>  }
>  
> +/* Return how many uses of T there are within BB, as long as there
> +   aren't any uses outside BB.  If there are any uses outside BB,
> +   return -1 if there's at most one use within BB, or -2 if there is
> +   more than one use within BB.  */
> +
> +static int
> +uses_in_bb (tree t, basic_block bb)
> +{
> +  int uses = 0;
> +  bool outside_bb = false;
> +
> +  imm_use_iterator iter;
> +  use_operand_p use_p;
> +  FOR_EACH_IMM_USE_FAST (use_p, iter, t)
> +    {
> +      if (is_gimple_debug (USE_STMT (use_p)))
> +	continue;
> +
> +      if (gimple_bb (USE_STMT (use_p)) != bb)
> +	outside_bb = true;
> +      else
> +	uses++;
> +
> +      if (outside_bb && uses > 1)
> +	return -2;
> +    }
> +
> +  if (outside_bb)
> +    return -1;
> +
> +  return uses;
> +}
> +
> +/* Starting from the final control flow stmt in BB, assuming it will
> +   be removed, follow uses in to-be-removed stmts back to their defs
> +   and count how many defs are to become dead and be removed as
> +   well.  */
> +
> +static int
> +estimate_threading_killed_stmts (basic_block bb)
> +{
> +  int killed_stmts = 0;
> +  hash_map<tree, int> ssa_remaining_uses;
> +  auto_vec<gimple *, 4> dead_worklist;
> +
> +  /* If the block has only two predecessors, threading will turn phi
> +     dsts into either src, so count them as dead stmts.  */
> +  bool drop_all_phis = EDGE_COUNT (bb->preds) == 2;
> +
> +  if (drop_all_phis)
> +    for (gphi_iterator gsi = gsi_start_phis (bb);
> +	 !gsi_end_p (gsi); gsi_next (&gsi))
> +      {
> +	gphi *phi = gsi.phi ();
> +	tree dst = gimple_phi_result (phi);
> +
> +	/* We don't count virtual PHIs as stmts in
> +	   record_temporary_equivalences_from_phis.  */
> +	if (virtual_operand_p (dst))
> +	  continue;
> +
> +	killed_stmts++;
> +      }
> +
> +  if (gsi_end_p (gsi_last_bb (bb)))
> +    return killed_stmts;
> +
> +  gimple *stmt = gsi_stmt (gsi_last_bb (bb));
> +  if (gimple_code (stmt) != GIMPLE_COND
> +      && gimple_code (stmt) != GIMPLE_GOTO
> +      && gimple_code (stmt) != GIMPLE_SWITCH)
> +    return killed_stmts;
> +
> +  dead_worklist.quick_push (stmt);
> +  while (!dead_worklist.is_empty ())
> +    {
> +      stmt = dead_worklist.pop ();
> +
> +      ssa_op_iter iter;
> +      use_operand_p use_p;
> +      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
> +	{
> +	  tree t = USE_FROM_PTR (use_p);
> +	  gimple *def = SSA_NAME_DEF_STMT (t);
> +
> +	  if (gimple_bb (def) == bb
> +	      && (gimple_code (def) != GIMPLE_PHI
> +		  || !drop_all_phis)
> +	      && !gimple_has_side_effects (def))
> +	    {
> +	      int *usesp = ssa_remaining_uses.get (t);
> +	      int uses;
> +
> +	      if (usesp)
> +		uses = *usesp;
> +	      else
> +		uses = uses_in_bb (t, bb);
> +
> +	      gcc_assert (uses);
> +
> +	      /* Don't bother recording the expected use count if we
> +		 won't find any further uses within BB.  */
> +	      if (!usesp && (uses < -1 || uses > 1))
> +		{
> +		  usesp = &ssa_remaining_uses.get_or_insert (t);
> +		  *usesp = uses;
> +		}
> +
> +	      if (uses < 0)
> +		continue;
> +
> +	      --uses;
> +	      if (usesp)
> +		*usesp = uses;
> +
> +	      if (!uses)
> +		{
> +		  killed_stmts++;
> +		  if (usesp)
> +		    ssa_remaining_uses.remove (t);
> +		  if (gimple_code (def) != GIMPLE_PHI)
> +		    dead_worklist.safe_push (def);
> +		}
> +	    }
> +	}
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "threading bb %i kills %i stmts\n",
> +	     bb->index, killed_stmts);
> +
> +  return killed_stmts;
So it appears the loop counts a statement as killed when its uses are no
longer needed.  AFAICT we fail to count the conditional as killed.
That's trivially fixed.

I also think we want to use this elsewhere.  I think putting it in
tree-ssa-threadupdate.c and exporting from tree-ssa-threadupdate.h
should work.







> +}
> +
> +/* Estimate, over all src blocks to be copied in PATH and E's dest,
> +   the number of stmts that become dead as a consequence of removing
> +   their final control flow stmts.  If PATH is not empty, E must be
> +   its last entry.  */
> +
> +static int
> +estimate_threading_killed_stmts (edge e, vec<jump_thread_edge *> *path)
> +{
> +  gcc_assert (path->length () == 0 || path->last ()->e == e);
> +  int total = estimate_threading_killed_stmts (e->dest);
> +
> +  int i;
> +  jump_thread_edge *je;
> +  FOR_EACH_VEC_ELT_REVERSE (*path, i, je)
> +    if (je->type == EDGE_COPY_SRC_BLOCK)
> +      total += estimate_threading_killed_stmts (je->e->src);
> +
> +  return total;
> +}
I don't think we need this function at all.

At the point where you're calling it, we're at the one and only block
within the jump threading path where we're copying a block where
statements are going to be killed.

So I think you should just remove this function and call the other one
(that works on just a basic block).  THat also means we don't have to
pass the path around.

I've got those minor tweaks in a tree here along with the changes to use
it to fix 36550 as well.  Let me run them through some deeper testing.


jeff



More information about the Gcc-patches mailing list