[PATCH][v2] tree-optimization/104912 - ensure cost model is checked first

Wed Apr 20 08:52:15 GMT 2022

> The following makes sure that when we build the versioning condition
> for vectorization including the cost model check, we check for the
> cost model and branch over other versioning checks.  That is what
> the cost modeling assumes, since the cost model check is the only
> one accounted for in the scalar outside cost.  Currently we emit
> all checks as straight-line code combined with bitwise ops which
> can result in surprising ordering of checks in the final assembly.
> 
> Since loop_version accepts only a single versioning condition
> the splitting is done after the fact.
> 
> The result is a 1.5% speedup of 416.gamess on x86_64 when compiling
> with -Ofast and tuning for generic or skylake.  That's not enough
> to recover from the slowdown when vectorizing but it now cuts off
> the expensive alias versioning test.
> 
> This is an update to the previously posted patch splitting the
> probability between the two branches as outlined in
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/592597.html
> 
> I've re-bootstrapped and tested this on x86_64-unknown-linux-gnu.
> 
> Honza - is the approach to splitting the probabilities sensible?
> This fixes a piece of a P1 regression.
> 
> Thanks,
> Richard.
> 
> 2022-03-21  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/104912
> 	* tree-vect-loop-manip.cc (vect_loop_versioning): Split
> 	the cost model check to a separate BB to make sure it is
> 	checked first and not combined with other version checks.
> ---
>  gcc/tree-vect-loop-manip.cc | 60 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 57 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 63fb6f669a0..e4381eb7079 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -3445,13 +3445,34 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
>  	cond_expr = expr;
>      }
>  
> +  tree cost_name = NULL_TREE;
> +  profile_probability prob2 = profile_probability::uninitialized ();
> +  if (cond_expr
> +      && !integer_truep (cond_expr)
> +      && (version_niter
> +	  || version_align
> +	  || version_alias
> +	  || version_simd_if_cond))

I assume that this condition...
>  
> +  /* Split the cost model check off to a separate BB.  Costing assumes
> +     this is the only thing we perform when we enter the scalar loop
> +     from a failed cost decision.  */
> +  if (cost_name && TREE_CODE (cost_name) == SSA_NAME)
is if and only if this condition
(otherwise prob2 would get uninitialized or lost)
> +    {
> +      gimple *def = SSA_NAME_DEF_STMT (cost_name);
> +      /* All uses of the cost check are 'true' after the check we
> +	 are going to insert.  */
> +      replace_uses_by (cost_name, boolean_true_node);
> +      /* And we're going to build the new single use of it.  */
> +      gcond *cond = gimple_build_cond (NE_EXPR, cost_name, boolean_false_node,
> +				       NULL_TREE, NULL_TREE);
> +      edge e = split_block (gimple_bb (def), def);
> +      gimple_stmt_iterator gsi = gsi_for_stmt (def);
> +      gsi_insert_after (&gsi, cond, GSI_NEW_STMT);
> +      edge true_e, false_e;
> +      extract_true_false_edges_from_block (e->dest, &true_e, &false_e);
> +      e->flags &= ~EDGE_FALLTHRU;
> +      e->flags |= EDGE_TRUE_VALUE;
> +      edge e2 = make_edge (e->src, false_e->dest, EDGE_FALSE_VALUE);
> +      e->probability = prob2;
> +      e2->probability = prob2.invert ();

So this looks fine to me.
Honza
> +      set_immediate_dominator (CDI_DOMINATORS, false_e->dest, e->src);
> +      auto_vec<basic_block, 3> adj;
> +      for (basic_block son = first_dom_son (CDI_DOMINATORS, e->dest);
> +	   son;
> +	   son = next_dom_son (CDI_DOMINATORS, son))
> +	if (EDGE_COUNT (son->preds) > 1)
> +	  adj.safe_push (son);
> +      for (auto son : adj)
> +	set_immediate_dominator (CDI_DOMINATORS, son, e->src);
> +    }
> +
>    if (version_niter)
>      {
>        /* The versioned loop could be infinite, we need to clear existing
> -- 
> 2.34.1