This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Maintain loop iteration count estimates

From: Richard Biener <rguenther at suse dot de>
To: Jan Hubicka <hubicka at ucw dot cz>
Cc: gcc-patches at gcc dot gnu dot org
Date: Fri, 10 Jun 2016 14:50:56 +0200 (CEST)
Subject: Re: Maintain loop iteration count estimates
Authentication-results: sourceware.org; auth=none
References: <20160609095842 dot GA98613 at kam dot mff dot cuni dot cz> <alpine dot LSU dot 2 dot 11 dot 1606091317210 dot 1493 at t29 dot fhfr dot qr> <20160609123459 dot GA89386 at kam dot mff dot cuni dot cz> <alpine dot LSU dot 2 dot 11 dot 1606101206020 dot 13824 at t29 dot fhfr dot qr> <20160610124512 dot GA78024 at kam dot mff dot cuni dot cz>

On Fri, 10 Jun 2016, Jan Hubicka wrote:

> > > I always interpreted the estimated number of iterations to be the same as
> > > expected number of iterations and to be same as average.  So it seems to be
> > > sane to feed the info from profile.
> > > 
> > > I am thinking to add the histograms, yes.  It is midly anoying to do so becuase
> > > one needs to intrstrument all exit edges out of loop. I guess we don't care
> > > much if histogram will get lost on abnormals.
> > 
> > Can't we just instrument the loop header (and thus get "averages" for all
> > exits)?
> 
> You need to know number of iterations, so either you save it into a global and
> flush into profile next time edge header->preheader is executed or you instrument
> exists. At least when one wants something more precise than average iteration count
> I think.
> > 
> > > One option I am thinking about is to introduce counter that will take two
> > > parameters A,B.  It will record linear histogram for values in range 0...A and
> > > logarithmic histogram for values greater than A capping by 2^B (so we don't
> > > need 64 counters for every loop).  I think we are not really that interested in
> > > precise histograms for loops that iterate more than, say, 2^10 times. We only
> > > need to know that it loops a lot.  We however care about low iteration counts
> > > to make good decision for peeling.  This is still 26 counters per loop that is
> > > quite a lot.  
> > 
> > I think what we are interested in is "clusters" of niters.  So yes,
> > having a histogram for low values (can we use HIST_TYPE_POW2 here?)
> > makes sense.  Or simply sum all n <= 16 and all n > 16 values separately?
> > We seem to lack a histogram type where we can specify N differently-sized
> > bins.
> 
> HIST_TYPE_POW2 measure how often value is exactly power of two and we don't
> have histogram counter at all, so we need to introduce one ;)
> 
> We have HIST_TYPE_INTERVAL that is real linear histogram for given interval of values.
> > 
> > > A lot cheaper alternative may be to simply measure loop peeling limit by 
> > > having counter that counts how often loop exits in first 
> > > PARAM_MAX_PEEL_TIMES iterations and second counter that measure what is 
> > > the maximal number of iterations in this case (we really want likely 
> > > maximal number of iterations but that seems harder to get). This will 
> > > determine peeling limit which we can then store into loop structure.
> > 
> > Sure.
> > 
> > In my original loop value-profiling work the most interesting (for
> > performance) case was value-profiling variable strides and then
> > versioning for stride == 1.  That triggered a lot with fortran
> > array descriptors though nowadays IPA-CP of aggregates plus LTO
> > might have solved this.
> 
> ipa-cp is not very powerful on propagating those. Measuring strides is
> interesting idea. I forgot about your work on the loop histograms. Do you
> still have patches around?

No, but those weren't complete anyway as we didn't preserve loops
back in time.  I think I only implemented versioning assuming the
stride would be one unconditionally to show the possible benefit
(it was for the 2006 Summit).

Richard.

> > 
> > > Vectorizer/unroller/prefetcher and most of other classical loop opts 
> > > care about the average being large enough so the current expected value 
> > > seems to do the trick.
> > 
> > Yes, though knowing that a significant fraction of times the iteration
> > count is one would also be useful.
> 
> Yes, if we interval profile the low iteration counts, we will have exact
> info on 0,1...n intrations.
> > 
> > > This does not let you to update profile very precisely after peeling (and also peeling
> > > done by vectorizer), but it needs only 2 counters per loop.
> > > 
> > > What other passes, beside peeling, would immediately benefit from a 
> > > historgram? I wonder if you can think of better scheme?
> > 
> > Epilogue peeling in the vectorizer (for the remainder of iterations,
> > thus profiling N % vectorization-factor which means a value-profile
> > of niter & <low-bits>).
> 
> Hmm, to predict that we will need to know the exact value modulo vectorization
> factor. This seems bit hard to measure an I am not sure how improtant it is.
> > 
> > Peeling for alignment profitability (we do too much of that IMHO),
> > thus value-profiling of <read1-addr> | <read2-addr> | ...
> > and <write1-addr> | <write2-addr> | ... (not sure if that would
> > be fine-grained enough - we want to know if peeling for alignment
> > will likely make many accesses aligned, not just the one we peel for).
> 
> Peeling for alingmnet can probably be predicted by the power of 2 predictor - that
> is already used by string routines. For that we need to identify which alignments
> we want to test at the vrp time and probably we can avoid alignmnet peeling in favour
> of versioning for loops that work on aligned data in practice.
> 
> Honza
> > 
> > > > 
> > > > > Currently we use profile each time estimate_numbers_of_iterations_loop
> > > > > is called to recompute this value.  This is not very safe because the
> > > > > profile may be misupdated.  It seems safer to compute it at once and
> > > > > maintain thorough the compilation.
> > > > > 
> > > > > Notice that I removed:
> > > > > -  /* Force estimate compuation but leave any existing upper bound in place.  */
> > > > > -  loop->any_estimate = false;
> > > > > From beggining of estimate_numbers_of_iterations_loop.  I can not make sense of
> > > > > this. Even w/o profile if we have estimate, we are better to maintain it because
> > > > > later we may not be able to derive it again.
> > > > > There seems to be no code that is forced by setting loop->any_estimate = true.
> > > > > Only code that cares seems to be record_niter_bound that only decreases existing
> > > > > estimates. THis seems sane procedure - we don't roll loops.
> > > > > 
> > > > > Bootstrapped/regtested x86_64-linux, OK?
> > > > 
> > > > Ok.  Did you check what this does to SPEC with FDO?
> > > 
> > > I didn't do full SPEC run with FDO, only tested tramp3d and xalancbmk I have
> > > readily available.  Any regressions would however point to loop info updating
> > > bugs (or wrong use of the profile) so I will look into them if they appear.
> > > I am trying to benchmark firefox now.
> > 
> > Ok.
> > 
> > Thanks,
> > Richard.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

Follow-Ups:
- Re: Maintain loop iteration count estimates
  - From: Jan Hubicka

References:
- Maintain loop iteration count estimates
  - From: Jan Hubicka
- Re: Maintain loop iteration count estimates
  - From: Richard Biener
- Re: Maintain loop iteration count estimates
  - From: Jan Hubicka
- Re: Maintain loop iteration count estimates
  - From: Richard Biener
- Re: Maintain loop iteration count estimates
  - From: Jan Hubicka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]