This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH PR77536]Generate correct profiling information for vectorized loop


On Mon, Feb 20, 2017 at 2:02 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > 2017-02-16  Bin Cheng  <bin.cheng@arm.com>
>> >
>> >         PR tree-optimization/77536
>> >         * tree-ssa-loop-manip.c (niter_for_unrolled_loop): New function.
>> >         (tree_transform_and_unroll_loop): Use above function to compute the
>> >         estimated niter of unrolled loop.
>> >         * tree-ssa-loop-manip.h niter_for_unrolled_loop(): New declaration.
>> >         * tree-vect-loop.c (scale_profile_for_vect_loop): New function.
>> >         (vect_transform_loop): Call above function.
Thanks very much for your suggestions.  I don't know profiling logic
very well and have some questions embedded below before I start
revising patch.
>
> +/* Return estimated niter for LOOP after unrolling by FACTOR times.  */
> +
> +unsigned
> +niter_for_unrolled_loop (struct loop *loop, unsigned factor)
> +{
> +  unsigned est_niter = expected_loop_iterations (loop);
>
> What happens when you have profile and loop iterates very many times?
> Perhaps we want to do all calculation in gcov_type and use
> expected_loop_iterations_unbounded>?
>
> expected_loop_iterations is capping by 10000 that is easy to overflow.
>
> +  gcc_assert (factor != 0);
> +  unsigned new_est_niter = est_niter / factor;
> +
> +  /* Without profile feedback, loops for which we do not know a better estimate
> +     are assumed to roll 10 times.  When we unroll such loop, it appears to
> +     roll too little, and it may even seem to be cold.  To avoid this, we
> +     ensure that the created loop appears to roll at least 5 times (but at
> +     most as many times as before unrolling).  */
> +  if (new_est_niter < 5)
> +    {
> +      if (est_niter < 5)
> +       new_est_niter = est_niter;
> +      else
> +       new_est_niter = 5;
> +    }
> +
> +  return new_est_niter;
> +}
>
> I see this code is pre-existing, but please extend it to test if
> loop->header->count is non-zero.  Even if we do not have idea about loop
> iteration count estimate we may end up predicting more than 10 iterations when
> predictors combine that way.
If I use expected_loop_iterations_unbounded, then do I need to handle
loop->header->count explicitly here?  I suppose not because it has
below code already:

  /* If we have no profile at all, use AVG_LOOP_NITER.  */
  if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
    expected = PARAM_VALUE (PARAM_AVG_LOOP_NITER);
  else if (loop->latch && (loop->latch->count || loop->header->count))
    {
      gcov_type count_in, count_latch;
      //...

The second question is: looks like it only takes latch->count into
consideration when PROFILE_ABSENT.  But according to your comments, we
could have nonzero count sometime?

>
> Perhaps testing estimated-loop_iterations would also make sense, but that
> could be dealt with incrementally.
>
> +static void
> +scale_profile_for_vect_loop (struct loop *loop, unsigned vf)
> +{
> +  unsigned freq_h = loop->header->frequency;
> +  unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop));
> +  /* Reduce loop iterations by the vectorization factor.  */
> +  unsigned new_est_niter = niter_for_unrolled_loop (loop, vf);
> +
> +  if (freq_h != 0)
> +    scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h);
> +
> I am always trying to avoid propagating small mistakes (i.e. frong freq_h or
> freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to avoid
> spreading mistakes across cfg.
>
> But I guess here it is sort of safe because vectorized loops are simple.
> You can't just scale down the existing counts/frequencies by vf, because the
> entry edge frequency was adjusted.
I am not 100% follow here, it looks the code avoids changing frequency
counter for preheader/exit edge, otherwise we would need to change all
counters dominated by them?
>
> Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you
> need to compute it before you start chanigng the CFG by peeling proplogue?
Peeling for prologue doesn't change profiling information of
vect_loop, it is the skip edge from before loop to preferred epilogue
loop that will change profile counters.  I guess here exists a dilemma
that niter_for_unrolled_loop is for loop after peeling for prologue?

Thanks,
bin
>
> Finally if freq_e is 0, all frequencies and counts will be probably dropped to
> 0.  What about determining fraction by counts if they are available?
>
> Otherwise the patch looks good and thanks a lot for working on this!
>
> Honza
>
>> >
>> > gcc/testsuite/ChangeLog
>> > 2017-02-16  Bin Cheng  <bin.cheng@arm.com>
>> >
>> >         PR tree-optimization/77536
>> >         * gcc.dg/vect/pr79347.c: Revise testing string.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]