This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Maintain loop iteration count estimates

From: Jan Hubicka <hubicka at ucw dot cz>
To: Richard Biener <rguenther at suse dot de>
Cc: Jan Hubicka <hubicka at ucw dot cz>, gcc-patches at gcc dot gnu dot org
Date: Fri, 10 Jun 2016 14:45:12 +0200
Subject: Re: Maintain loop iteration count estimates
Authentication-results: sourceware.org; auth=none
References: <20160609095842 dot GA98613 at kam dot mff dot cuni dot cz> <alpine dot LSU dot 2 dot 11 dot 1606091317210 dot 1493 at t29 dot fhfr dot qr> <20160609123459 dot GA89386 at kam dot mff dot cuni dot cz> <alpine dot LSU dot 2 dot 11 dot 1606101206020 dot 13824 at t29 dot fhfr dot qr>

> > I always interpreted the estimated number of iterations to be the same as
> > expected number of iterations and to be same as average.  So it seems to be
> > sane to feed the info from profile.
> > 
> > I am thinking to add the histograms, yes.  It is midly anoying to do so becuase
> > one needs to intrstrument all exit edges out of loop. I guess we don't care
> > much if histogram will get lost on abnormals.
> 
> Can't we just instrument the loop header (and thus get "averages" for all
> exits)?

You need to know number of iterations, so either you save it into a global and
flush into profile next time edge header->preheader is executed or you instrument
exists. At least when one wants something more precise than average iteration count
I think.
> 
> > One option I am thinking about is to introduce counter that will take two
> > parameters A,B.  It will record linear histogram for values in range 0...A and
> > logarithmic histogram for values greater than A capping by 2^B (so we don't
> > need 64 counters for every loop).  I think we are not really that interested in
> > precise histograms for loops that iterate more than, say, 2^10 times. We only
> > need to know that it loops a lot.  We however care about low iteration counts
> > to make good decision for peeling.  This is still 26 counters per loop that is
> > quite a lot.  
> 
> I think what we are interested in is "clusters" of niters.  So yes,
> having a histogram for low values (can we use HIST_TYPE_POW2 here?)
> makes sense.  Or simply sum all n <= 16 and all n > 16 values separately?
> We seem to lack a histogram type where we can specify N differently-sized
> bins.

HIST_TYPE_POW2 measure how often value is exactly power of two and we don't
have histogram counter at all, so we need to introduce one ;)

We have HIST_TYPE_INTERVAL that is real linear histogram for given interval of values.
> 
> > A lot cheaper alternative may be to simply measure loop peeling limit by 
> > having counter that counts how often loop exits in first 
> > PARAM_MAX_PEEL_TIMES iterations and second counter that measure what is 
> > the maximal number of iterations in this case (we really want likely 
> > maximal number of iterations but that seems harder to get). This will 
> > determine peeling limit which we can then store into loop structure.
> 
> Sure.
> 
> In my original loop value-profiling work the most interesting (for
> performance) case was value-profiling variable strides and then
> versioning for stride == 1.  That triggered a lot with fortran
> array descriptors though nowadays IPA-CP of aggregates plus LTO
> might have solved this.

ipa-cp is not very powerful on propagating those. Measuring strides is
interesting idea. I forgot about your work on the loop histograms. Do you
still have patches around?
> 
> > Vectorizer/unroller/prefetcher and most of other classical loop opts 
> > care about the average being large enough so the current expected value 
> > seems to do the trick.
> 
> Yes, though knowing that a significant fraction of times the iteration
> count is one would also be useful.

Yes, if we interval profile the low iteration counts, we will have exact
info on 0,1...n intrations.
> 
> > This does not let you to update profile very precisely after peeling (and also peeling
> > done by vectorizer), but it needs only 2 counters per loop.
> > 
> > What other passes, beside peeling, would immediately benefit from a 
> > historgram? I wonder if you can think of better scheme?
> 
> Epilogue peeling in the vectorizer (for the remainder of iterations,
> thus profiling N % vectorization-factor which means a value-profile
> of niter & <low-bits>).

Hmm, to predict that we will need to know the exact value modulo vectorization
factor. This seems bit hard to measure an I am not sure how improtant it is.
> 
> Peeling for alignment profitability (we do too much of that IMHO),
> thus value-profiling of <read1-addr> | <read2-addr> | ...
> and <write1-addr> | <write2-addr> | ... (not sure if that would
> be fine-grained enough - we want to know if peeling for alignment
> will likely make many accesses aligned, not just the one we peel for).

Peeling for alingmnet can probably be predicted by the power of 2 predictor - that
is already used by string routines. For that we need to identify which alignments
we want to test at the vrp time and probably we can avoid alignmnet peeling in favour
of versioning for loops that work on aligned data in practice.

Honza
> 
> > > 
> > > > Currently we use profile each time estimate_numbers_of_iterations_loop
> > > > is called to recompute this value.  This is not very safe because the
> > > > profile may be misupdated.  It seems safer to compute it at once and
> > > > maintain thorough the compilation.
> > > > 
> > > > Notice that I removed:
> > > > -  /* Force estimate compuation but leave any existing upper bound in place.  */
> > > > -  loop->any_estimate = false;
> > > > From beggining of estimate_numbers_of_iterations_loop.  I can not make sense of
> > > > this. Even w/o profile if we have estimate, we are better to maintain it because
> > > > later we may not be able to derive it again.
> > > > There seems to be no code that is forced by setting loop->any_estimate = true.
> > > > Only code that cares seems to be record_niter_bound that only decreases existing
> > > > estimates. THis seems sane procedure - we don't roll loops.
> > > > 
> > > > Bootstrapped/regtested x86_64-linux, OK?
> > > 
> > > Ok.  Did you check what this does to SPEC with FDO?
> > 
> > I didn't do full SPEC run with FDO, only tested tramp3d and xalancbmk I have
> > readily available.  Any regressions would however point to loop info updating
> > bugs (or wrong use of the profile) so I will look into them if they appear.
> > I am trying to benchmark firefox now.
> 
> Ok.
> 
> Thanks,
> Richard.

Follow-Ups:
- Re: Maintain loop iteration count estimates
  - From: Richard Biener

References:
- Maintain loop iteration count estimates
  - From: Jan Hubicka
- Re: Maintain loop iteration count estimates
  - From: Richard Biener
- Re: Maintain loop iteration count estimates
  - From: Jan Hubicka
- Re: Maintain loop iteration count estimates
  - From: Richard Biener

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]