IVOPT improvement patch

Tue May 25 10:46:00 GMT 2010

Hi,

> patch-1:
> 
> This patch improves the algorithm assigning iv candidates to uses: in
> the initial solution computation, do not compare cost savings
> 'locally' for one use-iv_cand pair, but comparing overall (all uses )
> cost of replacing with the new candidates (if possible) with the
> current best assignment cost. This will guarantee the initial solution
> to start from the minimal set of ivs.

so, this seems to be the important part of patch-1 (where min_ncand is
true only during the initial solution computation):

>  {
>    unsigned i;
>    comp_cost cost;
> @@ -4914,8 +4957,8 @@ iv_ca_extend (struct ivopts_data *data, 
>        if (!iv_ca_has_deps (ivs, new_cp))
>  	continue;
>  
> -      if (!cheaper_cost_pair (new_cp, old_cp))
> -	continue;
> +      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
> +        continue;
>  
>        *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
>      }

This is a rather confusing way to get the effect of starting with a small set
of ivs.  It would probably be better to get rid of try_add_cand_for and rewrite
get_initial_solution with this goal in mind.

> This patch also added fixes to consider profile data in cost
> computation, better dumps, and some other minor bug fixes.

it would be better not to include similar unrelated changes, as they make
reviewing the patch rather difficult.

> +/* Returns the expected number of loop iterations for LOOP.
> +   The average trip count is computed from profile data if it
> +   exists. */
> +
> +static inline unsigned
> +avg_loop_niter (struct loop *loop)
> +{
> +  unsigned tc;
> +  if (loop->header->count || loop->latch->count)
> +    tc = expected_loop_iterations (loop);
> +  else
> +    tc = AVG_LOOP_NITER (loop);
> +  if (tc == 0)
> +    tc++;
> +  return tc;
> +}

Using estimated_loop_iterations_int (loop, false) instead of adding another
similar function would be better.

> @@ -3811,6 +3841,9 @@ get_computation_cost_at (struct ivopts_d
>  					 &offset, depends_on));
>      }
>  
> +  /* Loop invariant computation.  */
> +  cost.cost /= avg_loop_niter (data->current_loop);
> +

This is wrong, at least some parts of the computation here are not loop invariant.

> @@ -4056,20 +4090,16 @@ may_eliminate_iv (struct ivopts_data *da
>    /* If not, and if this is the only possible exit of the loop, see whether
>       we can get a conservative estimate on the number of iterations of the
>       entire loop and compare against that instead.  */
> -  else if (loop_only_exit_p (loop, exit))
> +  else

This change is wrong, the test is necessary.  See
http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00146.html
and the following discussion.

>      {
>        double_int period_value, max_niter;
>        if (!estimated_loop_iterations (loop, true, &max_niter))
>  	return false;
>        period_value = tree_to_double_int (period);
> -      if (double_int_ucmp (max_niter, period_value) >= 0)
> +      if (double_int_ucmp (max_niter, period_value) > 0)
>  	return false;
>      }

This also seems wrong (or at least inconsistent with that is done for
the constant number of iterations).

>  /* Try changing candidate in IVS to CAND for each use.  Return cost of the
> @@ -4890,7 +4933,7 @@ iv_ca_dump (struct ivopts_data *data, FI
>  static comp_cost
>  iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
>  	      struct iv_cand *cand, struct iv_ca_delta **delta,
> -	      unsigned *n_ivs)
> +	      unsigned *n_ivs, bool min_ncand)

Document min_ncand argument in the function comment.

> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
> +#define TYPE char*
> +
> +void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
> +{
> +      int x;
> +       for( x = 0; x < i_width; x++ )
> +       {
> +           dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
> +       }
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
> +
> +#define TYPE char*
> +
> +void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
> +{
> +      int x;
> +       for( x = 0; x < i_width; x++ )
> +       {
> +           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
> +       }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
> +
> +#define TYPE char*
> +
> +void foo (int i_width, char* dst, char* src1, char* src2)
> +{
> +      int x;
> +       for( x = 0; x < i_width; x++ )
> +       {
> +           *((TYPE)dst) = ( *((TYPE)src1) + *((TYPE)src2) + 1 ) >> 1;
> +	   dst+=sizeof(TYPE);
> +	   src1+=sizeof(TYPE);
> +	   src2+=sizeof(TYPE);
> +       }
> +} 
> +
> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
> +
> +#ifndef TYPE
> +#define TYPE char*
> +#endif
> +
> +void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
> +{
> +      TYPE dstn= dst + i_width;
> +       for( ; dst < dstn; )
> +       {
> +           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
> +       }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
> +/* { dg-final { cleanup-tree-dump "ivopts" } } */

ivopt_{n}.c -> ivopts-{n+4}.c
Please add an explanation of what is the expected outcome (and why) to these
testcases.

Zdenek