[PATCH] Fix PR91975, tame PRE some more

Richard Biener rguenther@suse.de
Wed Oct 16 09:42:00 GMT 2019


On Mon, 7 Oct 2019, Richard Biener wrote:

> 
> The following tries to address the issue that PRE is quite happy
> to introduce new IVs in loops just because it can compute some
> constant value on the loop entry edge.  In principle there's
> already code that should work against that but it simply searches
> for a optimize_edge_for_speed () edge.  That still considers
> the loop entry edge to be worth optimizing because it ends
> up as maybe_hot_edge_p (e) for -O2 which compares the edge count
> against the entry block count.  For PRE we want something more
> local (comparing to the destination block count).
> 
> Now for the simple testcases this shouldn't make a difference
> but hot/cold uses PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) which
> isn't the same as profile_probabilities likely or very likely...
> 
> Still one key of the patch is that we compare the sum of the
> edge counts where the value is available (and thus the redundancy
> elimination happens) with the count we have to insert rather
> than looking for a single optimize_edge_for_speed_p edge.
> 
> For that I've used
> 
>     if (avail_count < block->count.apply_probability
>                                     (profile_probability::unlikely ()))
> 
> so we block insertion if the redundancies would overall be "unlikely".
> 
> I'm also not sure why maybe_hot_count_p uses HOT_BB_FREQUENCY_FRACTION
> while there exists HOT_BB_COUNT_FRACTION (with a ten-fold larger
> default value) that seems to match better for scaling a profile-count?
> 
> Honza?

Honza - does this sound sensible?  Can you comment on the
maybe_hot_count_p param use?

I wanted to avoid specifically looking to avoid inserting on
backedges but instead improve the optimize_edge_for_speed
heuristic.

> Bootstrap & regtest running on x86-64-unknown-linux-gnu.
> 
> Does the above predicate look sane or am I on a wrong track with
> using the destination block count here (I realize even the "locally cold"
> entries into the block might be quite hot globally).
> 
> For a 1:1 translation of the existing code to sth using the
> original predicate but summing over edges I could use
> !maybe_hot_count_p (cfun, avail_count)?  But then we're back to
> PRE doing the unwanted insertions.  Changing maybe_hot_count_p
> to use HOT_BB_COUNT_FRACTION doesn't make any difference there
> (obviously).
> 
> Thanks,
> Richard.
> 
> 2019-10-06  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/91975
> 	* tree-ssa-pre.c (do_pre_regular_insertion): Adjust
> 	profitability check to use the sum of all edge counts the
> 	value is available on and check against unlikely execution
> 	of the block.
> 
> 	* gcc.dg/tree-ssa/ldist-39.c: New testcase.
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
> new file mode 100644
> index 00000000000..a63548979ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-ldist-details" } */
> +
> +#define T int
> +
> +const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
> +T b[sizeof a / sizeof *a];
> +
> +void f0 (void)
> +{
> +  const T *s = a;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +    d[i] = s[i];
> +}
> +
> +void g0 (void)
> +{
> +  const T *s = a;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +    *d++ = *s++;
> +}
> +
> +extern const T c[sizeof a / sizeof *a];
> +
> +void f1 (void)
> +{
> +  const T *s = c;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +    d[i] = s[i];
> +}
> +
> +void g1 (void)
> +{
> +  const T *s = c;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +    *d++ = *s++;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */
> diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
> index c618601a184..af49ba388c1 100644
> --- a/gcc/tree-ssa-pre.c
> +++ b/gcc/tree-ssa-pre.c
> @@ -3195,7 +3195,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom)
>  	  pre_expr eprime = NULL;
>  	  edge_iterator ei;
>  	  pre_expr edoubleprime = NULL;
> -	  bool do_insertion = false;
> +	  profile_count avail_count = profile_count::zero ();
>  
>  	  val = get_expr_value_id (expr);
>  	  if (bitmap_set_contains_value (PHI_GEN (block), val))
> @@ -3250,10 +3250,7 @@ do_pre_regular_insertion (basic_block block, basic_block dom)
>  		{
>  		  avail[pred->dest_idx] = edoubleprime;
>  		  by_some = true;
> -		  /* We want to perform insertions to remove a redundancy on
> -		     a path in the CFG we want to optimize for speed.  */
> -		  if (optimize_edge_for_speed_p (pred))
> -		    do_insertion = true;
> +		  avail_count += pred->count ();
>  		  if (first_s == NULL)
>  		    first_s = edoubleprime;
>  		  else if (!pre_expr_d::equal (first_s, edoubleprime))
> @@ -3266,7 +3263,11 @@ do_pre_regular_insertion (basic_block block, basic_block dom)
>  	     partially redundant.  */
>  	  if (!cant_insert && !all_same && by_some)
>  	    {
> -	      if (!do_insertion)
> +	      /* We want to perform insertions to remove a redundancy on
> +		 a path in the CFG that is somewhat likely.  Avoid inserting
> +		 when we'd only remove a redundancy on unlikely paths.  */
> +	      if (avail_count < block->count.apply_probability
> +					    (profile_probability::unlikely ()))
>  		{
>  		  if (dump_file && (dump_flags & TDF_DETAILS))
>  		    {
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


More information about the Gcc-patches mailing list