[PATCH PR77536]Generate correct profiling information for vectorized loop
Jan Hubicka
hubicka@ucw.cz
Tue Feb 21 15:52:00 GMT 2017
> 2017-02-21 Bin Cheng <bin.cheng@arm.com>
>
> PR tree-optimization/77536
> * tree-ssa-loop-manip.c (niter_for_unrolled_loop): New function.
> (tree_transform_and_unroll_loop): Use above function to compute the
> estimated niter of unrolled loop and use it when scaling profile.
> * tree-ssa-loop-manip.h niter_for_unrolled_loop(): New declaration.
> * tree-vect-loop.c (scale_profile_for_vect_loop): New function.
> (vect_transform_loop): Call above function.
>
> gcc/testsuite/ChangeLog
> 2017-02-21 Bin Cheng <bin.cheng@arm.com>
>
> PR tree-optimization/77536
> * gcc.dg/vect/pr79347.c: Revise testing string.
> @@ -1329,7 +1339,12 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor,
> freq_h = loop->header->frequency;
> freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop));
> if (freq_h != 0)
> - scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h);
> + {
> + gcov_type scale;
> + /* This should not overflow. */
> + scale = GCOV_COMPUTE_SCALE (freq_e * (new_est_niter + 1), freq_h);
> + scale_loop_frequencies (loop, scale, REG_BR_PROB_BASE);
You need to use counts counts when new_est_niter is derrived from profile feedback.
This is because frequencies are capped to 10000, so if loop iterates very many times,
new_est_niter will be large, freq_h will be 10000 and freq_e will be 0.
Also watch the case when freq_e==loop_preheader_edge (loop)->count==0 and freq_h
is non-zero. Just do MAX (freq_e, 1). This will not drop the loop body profile to 0.
> +/* Scale profiling counters by estimation for LOOP which is vectorized
> + by factor VF. */
> +
> +static void
> +scale_profile_for_vect_loop (struct loop *loop, unsigned vf)
> +{
> + edge preheader = loop_preheader_edge (loop);
> + unsigned freq_h = loop->header->frequency;
> + unsigned freq_e = EDGE_FREQUENCY (preheader);
> + /* Reduce loop iterations by the vectorization factor. */
> + gcov_type new_est_niter = niter_for_unrolled_loop (loop, vf);
> +
> + /* Use profiling count information if frequencies are zero. */
> + if (freq_h == 0 || freq_e == 0)
> + {
> + freq_e = preheader->count;
> + freq_h = loop->header->count;
> + }
> +
> + if (freq_h != 0)
> + {
> + gcov_type scale;
> + /* This should not overflow. */
> + scale = GCOV_COMPUTE_SCALE (freq_e * (new_est_niter + 1), freq_h);
> + scale_loop_frequencies (loop, scale, REG_BR_PROB_BASE);
> + }
Similarly here. Use counts when they are non-zero and use MAX (freq_e, 1).
freq_e/freq_h needs to be gcov_type in that case.
Patch is OK with these changes. Thanks a lot!
Honza
> +
> + basic_block exit_bb = single_pred (loop->latch);
> + edge exit_e = single_exit (loop);
> + exit_e->count = loop_preheader_edge (loop)->count;
> + exit_e->probability = REG_BR_PROB_BASE / (new_est_niter + 1);
> +
> + edge exit_l = single_pred_edge (loop->latch);
> + int prob = exit_l->probability;
> + exit_l->probability = REG_BR_PROB_BASE - exit_e->probability;
> + exit_l->count = exit_bb->count - exit_e->count;
> + if (exit_l->count < 0)
> + exit_l->count = 0;
> + if (prob > 0)
> + scale_bbs_frequencies_int (&loop->latch, 1, exit_l->probability, prob);
> +}
> +
> /* Function vect_transform_loop.
>
> The analysis phase has determined that the loop is vectorizable.
> @@ -6743,16 +6785,10 @@ vect_transform_loop (loop_vec_info loop_vinfo)
> bool transform_pattern_stmt = false;
> bool check_profitability = false;
> int th;
> - /* Record number of iterations before we started tampering with the profile. */
> - gcov_type expected_iterations = expected_loop_iterations_unbounded (loop);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location, "=== vec_transform_loop ===\n");
>
> - /* If profile is inprecise, we have chance to fix it up. */
> - if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> - expected_iterations = LOOP_VINFO_INT_NITERS (loop_vinfo);
> -
> /* Use the more conservative vectorization threshold. If the number
> of iterations is constant assume the cost check has been performed
> by our caller. If the threshold makes all loops profitable that
> @@ -7068,9 +7104,8 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>
> slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>
> - /* Reduce loop iterations by the vectorization factor. */
> - scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
> - expected_iterations / vf);
> + scale_profile_for_vect_loop (loop, vf);
> +
> /* The minimum number of iterations performed by the epilogue. This
> is 1 when peeling for gaps because we always need a final scalar
> iteration. */
More information about the Gcc-patches
mailing list