# Maintain loop iteration count estimates

Richard Biener rguenther@suse.de
Fri Jun 10 10:15:00 GMT 2016

```On Thu, 9 Jun 2016, Jan Hubicka wrote:

> > On Thu, 9 Jun 2016, Jan Hubicka wrote:
> >
> > > Hi,
> > > after we read the profile, we know expected number of iterations.
> >
> > We know the average ;)  It may make sense to add some histogram
> > value profiling for niter now that we should easily able to do so.
>
> I always interpreted the estimated number of iterations to be the same as
> expected number of iterations and to be same as average.  So it seems to be
> sane to feed the info from profile.
>
> I am thinking to add the histograms, yes.  It is midly anoying to do so becuase
> one needs to intrstrument all exit edges out of loop. I guess we don't care
> much if histogram will get lost on abnormals.

Can't we just instrument the loop header (and thus get "averages" for all
exits)?

> One option I am thinking about is to introduce counter that will take two
> parameters A,B.  It will record linear histogram for values in range 0...A and
> logarithmic histogram for values greater than A capping by 2^B (so we don't
> need 64 counters for every loop).  I think we are not really that interested in
> precise histograms for loops that iterate more than, say, 2^10 times. We only
> need to know that it loops a lot.  We however care about low iteration counts
> to make good decision for peeling.  This is still 26 counters per loop that is
> quite a lot.

I think what we are interested in is "clusters" of niters.  So yes,
having a histogram for low values (can we use HIST_TYPE_POW2 here?)
makes sense.  Or simply sum all n <= 16 and all n > 16 values separately?
We seem to lack a histogram type where we can specify N differently-sized
bins.

> A lot cheaper alternative may be to simply measure loop peeling limit by
> having counter that counts how often loop exits in first
> PARAM_MAX_PEEL_TIMES iterations and second counter that measure what is
> the maximal number of iterations in this case (we really want likely
> maximal number of iterations but that seems harder to get). This will
> determine peeling limit which we can then store into loop structure.

Sure.

In my original loop value-profiling work the most interesting (for
performance) case was value-profiling variable strides and then
versioning for stride == 1.  That triggered a lot with fortran
array descriptors though nowadays IPA-CP of aggregates plus LTO
might have solved this.

> Vectorizer/unroller/prefetcher and most of other classical loop opts
> care about the average being large enough so the current expected value
> seems to do the trick.

Yes, though knowing that a significant fraction of times the iteration
count is one would also be useful.

> This does not let you to update profile very precisely after peeling (and also peeling
> done by vectorizer), but it needs only 2 counters per loop.
>
> What other passes, beside peeling, would immediately benefit from a
> historgram? I wonder if you can think of better scheme?

Epilogue peeling in the vectorizer (for the remainder of iterations,
thus profiling N % vectorization-factor which means a value-profile
of niter & <low-bits>).

Peeling for alignment profitability (we do too much of that IMHO),
and <write1-addr> | <write2-addr> | ... (not sure if that would
be fine-grained enough - we want to know if peeling for alignment
will likely make many accesses aligned, not just the one we peel for).

> >
> > > Currently we use profile each time estimate_numbers_of_iterations_loop
> > > is called to recompute this value.  This is not very safe because the
> > > profile may be misupdated.  It seems safer to compute it at once and
> > > maintain thorough the compilation.
> > >
> > > Notice that I removed:
> > > -  /* Force estimate compuation but leave any existing upper bound in place.  */
> > > -  loop->any_estimate = false;
> > > From beggining of estimate_numbers_of_iterations_loop.  I can not make sense of
> > > this. Even w/o profile if we have estimate, we are better to maintain it because
> > > later we may not be able to derive it again.
> > > There seems to be no code that is forced by setting loop->any_estimate = true.
> > > Only code that cares seems to be record_niter_bound that only decreases existing
> > > estimates. THis seems sane procedure - we don't roll loops.
> > >
> > > Bootstrapped/regtested x86_64-linux, OK?
> >
> > Ok.  Did you check what this does to SPEC with FDO?
>
> I didn't do full SPEC run with FDO, only tested tramp3d and xalancbmk I have
> readily available.  Any regressions would however point to loop info updating
> bugs (or wrong use of the profile) so I will look into them if they appear.
> I am trying to benchmark firefox now.

Ok.

Thanks,
Richard.

```