[PATCH] ipa-inline: Improve growth accumulation for recursive calls

Tamar Christina Tamar.Christina@arm.com
Fri Aug 21 09:17:11 GMT 2020


Hi Martin,

> Hi,
> 
> On Thu, Aug 20 2020, Richard Sandiford wrote:
> >>
> >>
> >> Really appreciate for your detailed explanation.  BTW, My previous
> >> patch for PGO build on exchange2 takes this similar method by setting
> >> each cloned node to 1/10th of the frequency several month agao :)
> >>
> >> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546926.html
> >
> > Does it seem likely that we'll reach a resolution on this soon?
> > I take the point that the patch that introduced the exchange
> > regression
> > [https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551757.html]
> > was just uncovering a latent issue, but still, this is a large
> > regression in an important benchmark to be carrying around.  For those
> > of us doing regular benchmark CI, the longer the performance trough
> > gets, the harder it is to spot other unrelated regressions in the “properly
> optimised” code.
> >
> > So if we don't have a specific plan for fixing the regression soon, I
> > think we should consider reverting the patch until we have something
> > that avoids the exchange regression, even though the patch wasn't
> > technically wrong.
> 
> Honza's changes have been motivated to big extent as an enabler for IPA-CP
> heuristics changes to actually speed up 548.exchange2_r.
> 
> On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds two
> weeks ago, this week it is 403, but with my WIP (and so far untested) patch
> below it is just 276 seconds - faster than one built with GCC 8 which needs
> 283 seconds.
> 
> I'll be interested in knowing if it also works this well on other architectures.
> 

Many thanks for working on this!

I tried this on an AArch64 Neoverse-N1 machine and didn't see any difference.
Do I need any flags for it to work? The patch was applied on top of 656218ab982cc22b826227045826c92743143af1

And I tried 3 runs
1) -mcpu=native -Ofast -fomit-frame-pointer -flto --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 -fno-inline-functions-called-once
2) -mcpu=native -Ofast -fomit-frame-pointer -flto -fno-inline-functions-called-once
3) -mcpu=native -Ofast -fomit-frame-pointer -flto

First one used to give us the best result, with this patch there's no difference between 1 and 2 (11% regression) and the 3rd one is about 15% on top of that.

If there's anything I can do to help just let me know.

Cheers,
Tamar

> The patch still needs a bit of a cleanup.  The change of the default value of
> ipa-cp-unit-growth needs to be done only for small compilation units (like
> inlining does).  I should experiment if the value of
> param_ipa_cp_loop_hint_bonus should be changed or not.  And last but not
> least, I also want to clean-up the interfaces between ipa-fnsummary.c and
> ipa-cp.c a bit.  I am working on all of this and hope to finish the patch set in a
> few (working) days.
> 
> The bottom line is that there is a plan to address this regression.
> 
> Martin
> 
> 
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index e4910a04ffa..0d44310503a
> 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3190,11 +3190,23 @@ devirtualization_time_bonus (struct
> cgraph_node *node,
>  /* Return time bonus incurred because of HINTS.  */
> 
>  static int
> -hint_time_bonus (cgraph_node *node, ipa_hints hints)
> +hint_time_bonus (cgraph_node *node, ipa_hints hints, sreal
> known_iter_freq,
> +		 sreal known_strides_freq)
>  {
>    int result = 0;
> -  if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
> -    result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
> +  sreal bonus_for_one = opt_for_fn (node->decl,
> + param_ipa_cp_loop_hint_bonus);
> +
> +  if (hints & INLINE_HINT_loop_iterations)
> +    {
> +      /* FIXME: This should probably be way more nuanced.  */
> +      result += (known_iter_freq * bonus_for_one).to_int ();
> +    }
> +  if (hints & INLINE_HINT_loop_stride)
> +    {
> +      /* FIXME: And this as well.  */
> +      result += (known_strides_freq * bonus_for_one).to_int ();
> +    }
> +
>    return result;
>  }
> 
> @@ -3395,12 +3407,13 @@ perform_estimation_of_a_value (cgraph_node
> *node, vec<tree> known_csts,
>  			       int est_move_cost, ipcp_value_base *val)  {
>    int size, time_benefit;
> -  sreal time, base_time;
> +  sreal time, base_time, known_iter_freq, known_strides_freq;
>    ipa_hints hints;
> 
>    estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
>  				     known_aggs, &size, &time,
> -				     &base_time, &hints);
> +				     &base_time, &hints, &known_iter_freq,
> +				     &known_strides_freq);
>    base_time -= time;
>    if (base_time > 65535)
>      base_time = 65535;
> @@ -3414,7 +3427,7 @@ perform_estimation_of_a_value (cgraph_node
> *node, vec<tree> known_csts,
>      time_benefit = base_time.to_int ()
>        + devirtualization_time_bonus (node, known_csts, known_contexts,
>  				     known_aggs)
> -      + hint_time_bonus (node, hints)
> +      + hint_time_bonus (node, hints, known_iter_freq,
> + known_strides_freq)
>        + removable_params_cost + est_move_cost;
> 
>    gcc_checking_assert (size >=0);
> @@ -3476,7 +3489,7 @@ estimate_local_effects (struct cgraph_node *node)
>      {
>        struct caller_statistics stats;
>        ipa_hints hints;
> -      sreal time, base_time;
> +      sreal time, base_time, known_iter_freq, known_strides_freq;
>        int size;
> 
>        init_caller_stats (&stats);
> @@ -3484,9 +3497,11 @@ estimate_local_effects (struct cgraph_node
> *node)
>  					      false);
>        estimate_ipcp_clone_size_and_time (node, known_csts,
> known_contexts,
>  					 known_aggs, &size, &time,
> -					 &base_time, &hints);
> +					 &base_time, &hints,
> &known_iter_freq,
> +					 &known_strides_freq);
>        time -= devirt_bonus;
> -      time -= hint_time_bonus (node, hints);
> +      time -= hint_time_bonus (node, hints, known_iter_freq,
> +			       known_strides_freq);
>        time -= removable_params_cost;
>        size -= stats.n_calls * removable_params_cost;
> 
> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c index
> 2cfab40156e..29fac5db19f 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -310,6 +310,35 @@ set_hint_predicate (predicate **p, predicate
> new_predicate)
>      }
>  }
> 
> +/* Find if NEW_PREDICATE is already in V and if so, increment its freq.
> +   Otherwise add a new item to the vector with this predicate and frerq
> equal
> +   to add_freq.  */
> +
> +static void
> +add_freqcounting_predicate (vec<ipa_freqcounting_predicate, va_gc> **v,
> +			    const predicate &new_predicate, sreal add_freq) {
> +  if (new_predicate == false || new_predicate == true)
> +    return;
> +  ipa_freqcounting_predicate *f;
> +  for (int i = 0; vec_safe_iterate (*v, i, &f); i++)
> +    if (new_predicate == f->predicate)
> +      {
> +	f->freq += add_freq;
> +	return;
> +      }
> +  /* FIXME: Make this a parameter.  */
> +  if (vec_safe_length (*v) >= 32)
> +    /* Too many different predicates to account for.  */
> +    return;
> +
> +  ipa_freqcounting_predicate fcp;
> +  fcp.predicate = NULL;
> +  set_hint_predicate (&fcp.predicate, new_predicate);
> +  fcp.freq = add_freq;
> +  vec_safe_push (*v, fcp);
> +  return;
> +}
> 
>  /* Compute what conditions may or may not hold given information about
>     parameters.  RET_CLAUSE returns truths that may hold in a specialized copy,
> @@ -722,10 +751,12 @@ ipa_call_summary::~ipa_call_summary ()
> 
>  ipa_fn_summary::~ipa_fn_summary ()
>  {
> -  if (loop_iterations)
> -    edge_predicate_pool.remove (loop_iterations);
> -  if (loop_stride)
> -    edge_predicate_pool.remove (loop_stride);
> +  unsigned len = vec_safe_length (loop_iterations);  for (unsigned i =
> + 0; i < len; i++)
> +    edge_predicate_pool.remove ((*loop_iterations)[i].predicate);
> +  len = vec_safe_length (loop_strides);  for (unsigned i = 0; i < len;
> + i++)
> +    edge_predicate_pool.remove ((*loop_strides)[i].predicate);
>    vec_free (conds);
>    vec_free (size_time_table);
>    vec_free (call_size_time_table);
> @@ -741,24 +772,33 @@ ipa_fn_summary_t::remove_callees (cgraph_node
> *node)
>      ipa_call_summaries->remove (e);
>  }
> 
> -/* Same as remap_predicate_after_duplication but handle hint predicate *P.
> -   Additionally care about allocating new memory slot for updated predicate
> -   and set it to NULL when it becomes true or false (and thus uninteresting).
> - */
> +/* Duplicate predicates in loop hint vector, allocating memory for them and
> +   remove and deallocate any uninteresting (true or false) ones.  Return the
> +   result.  */
> 
> -static void
> -remap_hint_predicate_after_duplication (predicate **p,
> -					clause_t possible_truths)
> +static vec<ipa_freqcounting_predicate, va_gc> *
> +remap_freqcounting_preds_after_dup (vec<ipa_freqcounting_predicate,
> va_gc> *v,
> +				    clause_t possible_truths)
>  {
> -  predicate new_predicate;
> +  if (vec_safe_length (v) == 0)
> +    return NULL;
> 
> -  if (!*p)
> -    return;
> +  vec<ipa_freqcounting_predicate, va_gc> *res = v->copy ();
> +  int len = res->length();
> +  for (int i = len - 1; i >= 0; i--)
> +    {
> +      predicate new_predicate
> +	= (*res)[i].predicate->remap_after_duplication (possible_truths);
> +      /* We do not want to free previous predicate; it is used by node
> +	 origin.  */
> +      (*res)[i].predicate = NULL;
> +      set_hint_predicate (&(*res)[i].predicate, new_predicate);
> +
> +      if (!(*res)[i].predicate)
> +	res->unordered_remove (i);
> +    }
> 
> -  new_predicate = (*p)->remap_after_duplication (possible_truths);
> -  /* We do not want to free previous predicate; it is used by node origin.  */
> -  *p = NULL;
> -  set_hint_predicate (p, new_predicate);
> +  return res;
>  }
> 
> 
> @@ -874,9 +914,11 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
>  	    optimized_out_size += es->call_stmt_size *
> ipa_fn_summary::size_scale;
>  	  edge_set_predicate (edge, &new_predicate);
>  	}
> -      remap_hint_predicate_after_duplication (&info->loop_iterations,
> +      info->loop_iterations
> +	= remap_freqcounting_preds_after_dup (info->loop_iterations,
>  					      possible_truths);
> -      remap_hint_predicate_after_duplication (&info->loop_stride,
> +      info->loop_strides
> +	= remap_freqcounting_preds_after_dup (info->loop_strides,
>  					      possible_truths);
> 
>        /* If inliner or someone after inliner will ever start producing @@ -888,17
> +930,21 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
>    else
>      {
>        info->size_time_table = vec_safe_copy (info->size_time_table);
> -      if (info->loop_iterations)
> +      info->loop_iterations = vec_safe_copy (info->loop_iterations);
> +      info->loop_strides = vec_safe_copy (info->loop_strides);
> +
> +      ipa_freqcounting_predicate *f;
> +      for (int i = 0; vec_safe_iterate (info->loop_iterations, i, &f);
> + i++)
>  	{
> -	  predicate p = *info->loop_iterations;
> -	  info->loop_iterations = NULL;
> -	  set_hint_predicate (&info->loop_iterations, p);
> +	  predicate p = *f->predicate;
> +	  f->predicate = NULL;
> +	  set_hint_predicate (&f->predicate, p);
>  	}
> -      if (info->loop_stride)
> +      for (int i = 0; vec_safe_iterate (info->loop_strides, i, &f);
> + i++)
>  	{
> -	  predicate p = *info->loop_stride;
> -	  info->loop_stride = NULL;
> -	  set_hint_predicate (&info->loop_stride, p);
> +	  predicate p = *f->predicate;
> +	  f->predicate = NULL;
> +	  set_hint_predicate (&f->predicate, p);
>  	}
>      }
>    if (!dst->inlined_to)
> @@ -1057,15 +1103,28 @@ ipa_dump_fn_summary (FILE *f, struct
> cgraph_node *node)
>  		}
>  	      fprintf (f, "\n");
>  	    }
> -	  if (s->loop_iterations)
> +	  ipa_freqcounting_predicate *fcp;
> +	  bool first_fcp = true;
> +	  for (int i = 0; vec_safe_iterate (s->loop_iterations, i, &fcp); i++)
>  	    {
> -	      fprintf (f, "  loop iterations:");
> -	      s->loop_iterations->dump (f, s->conds);
> +	      if (first_fcp)
> +		{
> +		  fprintf (f, "  loop iterations:");
> +		  first_fcp = false;
> +		}
> +	      fprintf (f, "  %3.2f for ", fcp->freq.to_double ());
> +	      fcp->predicate->dump (f, s->conds);
>  	    }
> -	  if (s->loop_stride)
> +	  first_fcp = true;
> +	  for (int i = 0; vec_safe_iterate (s->loop_strides, i, &fcp); i++)
>  	    {
> -	      fprintf (f, "  loop stride:");
> -	      s->loop_stride->dump (f, s->conds);
> +	      if (first_fcp)
> +		{
> +		  fprintf (f, "  loop strides:");
> +		  first_fcp = false;
> +		}
> +	      fprintf (f, "  %3.2f for :", fcp->freq.to_double ());
> +	      fcp->predicate->dump (f, s->conds);
>  	    }
>  	  fprintf (f, "  calls:\n");
>  	  dump_ipa_call_summary (f, 4, node, s); @@ -2514,12 +2573,13 @@
> analyze_function_body (struct cgraph_node *node, bool early)
> 
>    if (fbi.info)
>      compute_bb_predicates (&fbi, node, info, params_summary);
> +  const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN
> + (cfun)->count;
>    order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>    nblocks = pre_and_rev_post_order_compute (NULL, order, false);
>    for (n = 0; n < nblocks; n++)
>      {
>        bb = BASIC_BLOCK_FOR_FN (cfun, order[n]);
> -      freq = bb->count.to_sreal_scale (ENTRY_BLOCK_PTR_FOR_FN (cfun)-
> >count);
> +      freq = bb->count.to_sreal_scale (entry_count);
>        if (clobber_only_eh_bb_p (bb))
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS)) @@ -2758,23
> +2818,27 @@ analyze_function_body (struct cgraph_node *node, bool early)
> 
>    if (nonconstant_names.exists () && !early)
>      {
> +      ipa_fn_summary *s = ipa_fn_summaries->get (node);
>        class loop *loop;
> -      predicate loop_iterations = true;
> -      predicate loop_stride = true;
> 
>        if (dump_file && (dump_flags & TDF_DETAILS))
>  	flow_loops_dump (dump_file, NULL, 0);
>        scev_initialize ();
>        FOR_EACH_LOOP (loop, 0)
>  	{
> +	  predicate loop_iterations = true;
> +	  sreal header_freq;
>  	  vec<edge> exits;
>  	  edge ex;
>  	  unsigned int j;
>  	  class tree_niter_desc niter_desc;
>  	  if (loop->header->aux)
> -	    bb_predicate = *(predicate *) loop->header->aux;
> +	    {
> +	      bb_predicate = *(predicate *) loop->header->aux;
> +	      header_freq = loop->header->count.to_sreal_scale
> (entry_count);
> +	    }
>  	  else
> -	    bb_predicate = false;
> +	    continue;
> 
>  	  exits = get_loop_exit_edges (loop);
>  	  FOR_EACH_VEC_ELT (exits, j, ex)
> @@ -2790,10 +2854,10 @@ analyze_function_body (struct cgraph_node
> *node, bool early)
>  		will_be_nonconstant = bb_predicate & will_be_nonconstant;
>  	      if (will_be_nonconstant != true
>  		  && will_be_nonconstant != false)
> -		/* This is slightly inprecise.  We may want to represent each
> -		   loop with independent predicate.  */
>  		loop_iterations &= will_be_nonconstant;
>  	    }
> +	  add_freqcounting_predicate (&s->loop_iterations, loop_iterations,
> +				      header_freq);
>  	  exits.release ();
>  	}
> 
> @@ -2803,14 +2867,20 @@ analyze_function_body (struct cgraph_node
> *node, bool early)
>        for (loop = loops_for_fn (cfun)->tree_root->inner;
>  	   loop != NULL; loop = loop->next)
>  	{
> +	  predicate loop_stride = true;
> +	  sreal bb_freq;
>  	  basic_block *body = get_loop_body (loop);
>  	  for (unsigned i = 0; i < loop->num_nodes; i++)
>  	    {
>  	      gimple_stmt_iterator gsi;
>  	      if (body[i]->aux)
> -		bb_predicate = *(predicate *) body[i]->aux;
> +		{
> +		  bb_predicate = *(predicate *) body[i]->aux;
> +		  bb_freq = body[i]->count.to_sreal_scale (entry_count);
> +		}
>  	      else
> -		bb_predicate = false;
> +		continue;
> +
>  	      for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);
>  		   gsi_next (&gsi))
>  		{
> @@ -2839,16 +2909,13 @@ analyze_function_body (struct cgraph_node
> *node, bool early)
>  		    will_be_nonconstant = bb_predicate &
> will_be_nonconstant;
>  		  if (will_be_nonconstant != true
>  		      && will_be_nonconstant != false)
> -		    /* This is slightly inprecise.  We may want to represent
> -		       each loop with independent predicate.  */
>  		    loop_stride = loop_stride & will_be_nonconstant;
>  		}
>  	    }
> +	  add_freqcounting_predicate (&s->loop_strides, loop_stride,
> +				      bb_freq);
>  	  free (body);
>  	}
> -      ipa_fn_summary *s = ipa_fn_summaries->get (node);
> -      set_hint_predicate (&s->loop_iterations, loop_iterations);
> -      set_hint_predicate (&s->loop_stride, loop_stride);
>        scev_finalize ();
>      }
>    FOR_ALL_BB_FN (bb, my_function)
> @@ -3640,14 +3707,32 @@ ipa_call_context::estimate_size_and_time (int
> *ret_size,
>    if (time > nonspecialized_time)
>      time = nonspecialized_time;
> 
> +  m_loops_with_known_iterations = 0;
> +  ipa_freqcounting_predicate *fcp;
> +  for (i = 0; vec_safe_iterate (info->loop_iterations, i, &fcp); i++)
> +    {
> +      gcc_assert (fcp->predicate);
> +      if (!fcp->predicate->evaluate (m_possible_truths))
> +	{
> +	  if (ret_hints)
> +	    hints |= INLINE_HINT_loop_iterations;
> +	  m_loops_with_known_iterations += fcp->freq;
> +	}
> +    }
> +  m_loops_with_known_strides = 0;
> +  for (i = 0; vec_safe_iterate (info->loop_strides, i, &fcp); i++)
> +    {
> +      gcc_assert (fcp->predicate);
> +      if (!fcp->predicate->evaluate (m_possible_truths))
> +	{
> +	  if (ret_hints)
> +	    hints |= INLINE_HINT_loop_stride;
> +	  m_loops_with_known_strides += fcp->freq;
> +	}
> +    }
> +
>    if (ret_hints)
>      {
> -      if (info->loop_iterations
> -	  && !info->loop_iterations->evaluate (m_possible_truths))
> -	hints |= INLINE_HINT_loop_iterations;
> -      if (info->loop_stride
> -	  && !info->loop_stride->evaluate (m_possible_truths))
> -	hints |= INLINE_HINT_loop_stride;
>        if (info->scc_no)
>  	hints |= INLINE_HINT_in_scc;
>        if (DECL_DECLARED_INLINE_P (m_node->decl)) @@ -3687,7 +3772,9 @@
> estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
>  				   vec<ipa_agg_value_set> known_aggs,
>  				   int *ret_size, sreal *ret_time,
>  				   sreal *ret_nonspec_time,
> -				   ipa_hints *hints)
> +				   ipa_hints *hints,
> +				   sreal *loops_with_known_iterations,
> +				   sreal *loops_with_known_strides)
>  {
>    clause_t clause, nonspec_clause;
> 
> @@ -3699,6 +3786,8 @@ estimate_ipcp_clone_size_and_time (struct
> cgraph_node *node,
>  		        known_aggs, vNULL);
>    ctx.estimate_size_and_time (ret_size, NULL, ret_time,
>  			      ret_nonspec_time, hints);
> +  *loops_with_known_iterations = ctx.m_loops_with_known_iterations;
> +  *loops_with_known_strides = ctx.m_loops_with_known_strides;
>  }
> 
>  /* Return stack frame offset where frame of NODE is supposed to start
> inside @@ -3857,32 +3946,31 @@ remap_edge_summaries (struct
> cgraph_edge *inlined_edge,
>      }
>  }
> 
> -/* Same as remap_predicate, but set result into hint *HINT.  */
> +/* Run remap_after_inlining on each predicate in V.  */
> 
>  static void
> -remap_hint_predicate (class ipa_fn_summary *info,
> -		      class ipa_node_params *params_summary,
> -		      class ipa_fn_summary *callee_info,
> -		      predicate **hint,
> -		      vec<int> operand_map,
> -		      vec<int> offset_map,
> -		      clause_t possible_truths,
> -		      predicate *toplev_predicate)
> -{
> -  predicate p;
> +remap_freqcounting_predicate (class ipa_fn_summary *info,
> +			      class ipa_node_params *params_summary,
> +			      class ipa_fn_summary *callee_info,
> +			      vec<ipa_freqcounting_predicate, va_gc> *v,
> +			      vec<int> operand_map,
> +			      vec<int> offset_map,
> +			      clause_t possible_truths,
> +			      predicate *toplev_predicate)
> 
> -  if (!*hint)
> -    return;
> -  p = (*hint)->remap_after_inlining
> -			 (info, params_summary, callee_info,
> -			  operand_map, offset_map,
> -			  possible_truths, *toplev_predicate);
> -  if (p != false && p != true)
> +{
> +  ipa_freqcounting_predicate *fcp;
> +  for (int i = 0; vec_safe_iterate (v, i, &fcp); i++)
>      {
> -      if (!*hint)
> -	set_hint_predicate (hint, p);
> -      else
> -	**hint &= p;
> +      predicate p
> +	= fcp->predicate->remap_after_inlining (info, params_summary,
> +						callee_info, operand_map,
> +						offset_map, possible_truths,
> +						*toplev_predicate);
> +      if (p != false && p != true)
> +	/* FIXME: Is this really supposed to be &= and not a plain
> +	   assignment?  */
> +	*fcp->predicate &= p;
>      }
>  }
> 
> @@ -3992,12 +4080,12 @@ ipa_merge_fn_summary_after_inlining (struct
> cgraph_edge *edge)
>    remap_edge_summaries (edge, edge->callee, info, params_summary,
>  		 	callee_info, operand_map,
>  			offset_map, clause, &toplev_predicate);
> -  remap_hint_predicate (info, params_summary, callee_info,
> -			&callee_info->loop_iterations,
> -			operand_map, offset_map, clause,
> &toplev_predicate);
> -  remap_hint_predicate (info, params_summary, callee_info,
> -			&callee_info->loop_stride,
> -			operand_map, offset_map, clause,
> &toplev_predicate);
> +  remap_freqcounting_predicate (info, params_summary, callee_info,
> +				info->loop_iterations, operand_map,
> +				offset_map, clause, &toplev_predicate);
> +  remap_freqcounting_predicate (info, params_summary, callee_info,
> +				info->loop_strides, operand_map,
> +				offset_map, clause, &toplev_predicate);
> 
>    HOST_WIDE_INT stack_frame_offset = ipa_get_stack_frame_offset (edge-
> >callee);
>    HOST_WIDE_INT peak = stack_frame_offset + callee_info-
> >estimated_stack_size;
> @@ -4322,12 +4410,34 @@ inline_read_section (struct lto_file_decl_data
> *file_data, const char *data,
>  	    info->size_time_table->quick_push (e);
>  	}
> 
> -      p.stream_in (&ib);
> -      if (info)
> -        set_hint_predicate (&info->loop_iterations, p);
> -      p.stream_in (&ib);
> -      if (info)
> -        set_hint_predicate (&info->loop_stride, p);
> +      count2 = streamer_read_uhwi (&ib);
> +      for (j = 0; j < count2; j++)
> +	{
> +	  p.stream_in (&ib);
> +	  sreal fcp_freq = sreal::stream_in (&ib);
> +	  if (info)
> +	    {
> +	      ipa_freqcounting_predicate fcp;
> +	      fcp.predicate = NULL;
> +	      set_hint_predicate (&fcp.predicate, p);
> +	      fcp.freq = fcp_freq;
> +	      vec_safe_push (info->loop_iterations, fcp);
> +	    }
> +	}
> +      count2 = streamer_read_uhwi (&ib);
> +      for (j = 0; j < count2; j++)
> +	{
> +	  p.stream_in (&ib);
> +	  sreal fcp_freq = sreal::stream_in (&ib);
> +	  if (info)
> +	    {
> +	      ipa_freqcounting_predicate fcp;
> +	      fcp.predicate = NULL;
> +	      set_hint_predicate (&fcp.predicate, p);
> +	      fcp.freq = fcp_freq;
> +	      vec_safe_push (info->loop_strides, fcp);
> +	    }
> +	}
>        for (e = node->callees; e; e = e->next_callee)
>  	read_ipa_call_summary (&ib, e, info != NULL);
>        for (e = node->indirect_calls; e; e = e->next_callee) @@ -4487,14
> +4597,19 @@ ipa_fn_summary_write (void)
>  	      e->exec_predicate.stream_out (ob);
>  	      e->nonconst_predicate.stream_out (ob);
>  	    }
> -	  if (info->loop_iterations)
> -	    info->loop_iterations->stream_out (ob);
> - 	  else
> -	    streamer_write_uhwi (ob, 0);
> -	  if (info->loop_stride)
> -	    info->loop_stride->stream_out (ob);
> - 	  else
> -	    streamer_write_uhwi (ob, 0);
> +	  ipa_freqcounting_predicate *fcp;
> +	  streamer_write_uhwi (ob, vec_safe_length (info->loop_iterations));
> +	  for (i = 0; vec_safe_iterate (info->loop_iterations, i, &fcp); i++)
> +	    {
> +	      fcp->predicate->stream_out (ob);
> +	      fcp->freq.stream_out (ob);
> +	    }
> +	  streamer_write_uhwi (ob, vec_safe_length (info->loop_strides));
> +	  for (i = 0; vec_safe_iterate (info->loop_strides, i, &fcp); i++)
> +	    {
> +	      fcp->predicate->stream_out (ob);
> +	      fcp->freq.stream_out (ob);
> +	    }
>  	  for (edge = cnode->callees; edge; edge = edge->next_callee)
>  	    write_ipa_call_summary (ob, edge);
>  	  for (edge = cnode->indirect_calls; edge; edge = edge->next_callee)
> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h index
> c6ddc9f3199..d8429afdbef 100644
> --- a/gcc/ipa-fnsummary.h
> +++ b/gcc/ipa-fnsummary.h
> @@ -101,6 +101,19 @@ public:
>    }
>  };
> 
> +/* Structure to capture how frequently some interesting events occur given
> a
> +   particular predicate.  The structure is used to estimate how often we
> +   encounter loops with known iteration count or stride in various
> +   contexts.  */
> +
> +struct GTY(()) ipa_freqcounting_predicate {
> +  /* The described event happens with this frequency... */
> +  sreal freq;
> +  /* ...when this predicate evaluates to false. */
> +  class predicate * GTY((skip)) predicate; };
> +
>  /* Function inlining information.  */
>  class GTY(()) ipa_fn_summary
>  {
> @@ -112,8 +125,9 @@ public:
>        inlinable (false), single_caller (false),
>        fp_expressions (false), estimated_stack_size (false),
>        time (0), conds (NULL),
> -      size_time_table (NULL), call_size_time_table (NULL), loop_iterations
> (NULL),
> -      loop_stride (NULL), growth (0), scc_no (0)
> +      size_time_table (NULL), call_size_time_table (NULL),
> +      loop_iterations (NULL), loop_strides (NULL),
> +      growth (0), scc_no (0)
>    {
>    }
> 
> @@ -125,13 +139,12 @@ public:
>      estimated_stack_size (s.estimated_stack_size),
>      time (s.time), conds (s.conds), size_time_table (s.size_time_table),
>      call_size_time_table (NULL),
> -    loop_iterations (s.loop_iterations), loop_stride (s.loop_stride),
> +    loop_iterations (s.loop_iterations), loop_strides (s.loop_strides),
>      growth (s.growth), scc_no (s.scc_no)
>    {}
> 
>    /* Default constructor.  */
>    ~ipa_fn_summary ();
> -
>    /* Information about the function body itself.  */
> 
>    /* Minimal size increase after inlining.  */ @@ -164,12 +177,10 @@ public:
>    vec<size_time_entry, va_gc> *size_time_table;
>    vec<size_time_entry, va_gc> *call_size_time_table;
> 
> -  /* Predicate on when some loop in the function becomes to have known
> -     bounds.   */
> -  predicate * GTY((skip)) loop_iterations;
> -  /* Predicate on when some loop in the function becomes to have known
> -     stride.   */
> -  predicate * GTY((skip)) loop_stride;
> +  /* Predicates on when some loops in the function can have known
> + bounds.  */  vec<ipa_freqcounting_predicate, va_gc> *loop_iterations;
> +  /* Predicates on when some loops in the function can have known
> + strides.  */  vec<ipa_freqcounting_predicate, va_gc> *loop_strides;
>    /* Estimated growth for inlining all copies of the function before start
>       of small functions inlining.
>       This value will get out of date as the callers are duplicated, but @@ -316,6
> +327,15 @@ public:
>    {
>      return m_node != NULL;
>    }
> +
> +
> +  /* How often loops will have known iterations.  Calculated in
> +     estimate_size_and_time.  */
> +  sreal m_loops_with_known_iterations;
> +  /* How often loops will have known strides.  Calculated in
> +     estimate_size_and_time. */
> +  sreal m_loops_with_known_strides;
> +
>  private:
>    /* Called function.  */
>    cgraph_node *m_node;
> @@ -353,7 +373,7 @@ void estimate_ipcp_clone_size_and_time (struct
> cgraph_node *,
>  					vec<ipa_polymorphic_call_context>,
>  					vec<ipa_agg_value_set>,
>  					int *, sreal *, sreal *,
> -				        ipa_hints *);
> +				        ipa_hints *, sreal *, sreal *);
>  void ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge);
> void ipa_update_overall_fn_summary (struct cgraph_node *node, bool
> reset = true);  void compute_fn_summary (struct cgraph_node *, bool); diff -
> -git a/gcc/params.opt b/gcc/params.opt index f39e5d1a012..2a5f3d61727
> 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -211,7 +211,7 @@ Common Joined UInteger
> Var(param_ipa_cp_single_call_penalty) Init(15) IntegerRan  Percentage
> penalty functions containing a single call to another function will receive
> when they are evaluated for cloning.
> 
>  -param=ipa-cp-unit-growth=
> -Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(10) Param
> Optimization
> +Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(80) Param
> +Optimization
>  How much can given compilation unit grow because of the interprocedural
> constant propagation (in percent).
> 
>  -param=ipa-cp-value-list-size=


More information about the Gcc-patches mailing list