[PATCH PR77536]Generate correct profiling information for vectorized loop

Jan Hubicka hubicka@ucw.cz
Tue Feb 21 15:52:00 GMT 2017


> 2017-02-21  Bin Cheng  <bin.cheng@arm.com>
> 
> PR tree-optimization/77536
> * tree-ssa-loop-manip.c (niter_for_unrolled_loop): New function.
> (tree_transform_and_unroll_loop): Use above function to compute the
> estimated niter of unrolled loop and use it when scaling profile.
> * tree-ssa-loop-manip.h niter_for_unrolled_loop(): New declaration.
> * tree-vect-loop.c (scale_profile_for_vect_loop): New function.
> (vect_transform_loop): Call above function.
> 
> gcc/testsuite/ChangeLog
> 2017-02-21  Bin Cheng  <bin.cheng@arm.com>
> 
> PR tree-optimization/77536
> * gcc.dg/vect/pr79347.c: Revise testing string.
> @@ -1329,7 +1339,12 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor,
>    freq_h = loop->header->frequency;
>    freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop));
>    if (freq_h != 0)
> -    scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h);
> +    {
> +      gcov_type scale;
> +      /* This should not overflow.  */
> +      scale = GCOV_COMPUTE_SCALE (freq_e * (new_est_niter + 1), freq_h);
> +      scale_loop_frequencies (loop, scale, REG_BR_PROB_BASE);

You need to use counts counts when new_est_niter is derrived from profile feedback.
This is because frequencies are capped to 10000, so if loop iterates very many times,
new_est_niter will be large, freq_h will be 10000 and freq_e will be 0.

Also watch the case when freq_e==loop_preheader_edge (loop)->count==0 and freq_h
is non-zero.  Just do MAX (freq_e, 1). This will not drop the loop body profile to 0.

> +/* Scale profiling counters by estimation for LOOP which is vectorized
> +   by factor VF.  */
> +
> +static void
> +scale_profile_for_vect_loop (struct loop *loop, unsigned vf)
> +{
> +  edge preheader = loop_preheader_edge (loop);
> +  unsigned freq_h = loop->header->frequency;
> +  unsigned freq_e = EDGE_FREQUENCY (preheader);
> +  /* Reduce loop iterations by the vectorization factor.  */
> +  gcov_type new_est_niter = niter_for_unrolled_loop (loop, vf);
> +
> +  /* Use profiling count information if frequencies are zero.  */
> +  if (freq_h == 0 || freq_e == 0)
> +    {
> +      freq_e = preheader->count;
> +      freq_h = loop->header->count;
> +    }
> +
> +  if (freq_h != 0)
> +    {
> +      gcov_type scale;
> +      /* This should not overflow.  */
> +      scale = GCOV_COMPUTE_SCALE (freq_e * (new_est_niter + 1), freq_h);
> +      scale_loop_frequencies (loop, scale, REG_BR_PROB_BASE);
> +    }

Similarly here. Use counts when they are non-zero and use MAX (freq_e, 1).
freq_e/freq_h needs to be gcov_type in that case.

Patch is OK with these changes.  Thanks a lot!
Honza
> +
> +  basic_block exit_bb = single_pred (loop->latch);
> +  edge exit_e = single_exit (loop);
> +  exit_e->count = loop_preheader_edge (loop)->count;
> +  exit_e->probability = REG_BR_PROB_BASE / (new_est_niter + 1);
> +
> +  edge exit_l = single_pred_edge (loop->latch);
> +  int prob = exit_l->probability;
> +  exit_l->probability = REG_BR_PROB_BASE - exit_e->probability;
> +  exit_l->count = exit_bb->count - exit_e->count;
> +  if (exit_l->count < 0)
> +    exit_l->count = 0;
> +  if (prob > 0)
> +    scale_bbs_frequencies_int (&loop->latch, 1, exit_l->probability, prob);
> +}
> +
>  /* Function vect_transform_loop.
>  
>     The analysis phase has determined that the loop is vectorizable.
> @@ -6743,16 +6785,10 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>    bool transform_pattern_stmt = false;
>    bool check_profitability = false;
>    int th;
> -  /* Record number of iterations before we started tampering with the profile. */
> -  gcov_type expected_iterations = expected_loop_iterations_unbounded (loop);
>  
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location, "=== vec_transform_loop ===\n");
>  
> -  /* If profile is inprecise, we have chance to fix it up.  */
> -  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> -    expected_iterations = LOOP_VINFO_INT_NITERS (loop_vinfo);
> -
>    /* Use the more conservative vectorization threshold.  If the number
>       of iterations is constant assume the cost check has been performed
>       by our caller.  If the threshold makes all loops profitable that
> @@ -7068,9 +7104,8 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>  
>    slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>  
> -  /* Reduce loop iterations by the vectorization factor.  */
> -  scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
> -		      expected_iterations / vf);
> +  scale_profile_for_vect_loop (loop, vf);
> +
>    /* The minimum number of iterations performed by the epilogue.  This
>       is 1 when peeling for gaps because we always need a final scalar
>       iteration.  */



More information about the Gcc-patches mailing list