This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [patch] Improve prefetch heuristics
- From: Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>
- To: "Fang, Changpeng" <Changpeng dot Fang at amd dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, "rguenther at suse dot de" <rguenther at suse dot de>, "sebpop at gmail dot com" <sebpop at gmail dot com>
- Date: Tue, 4 May 2010 07:23:26 +0200
- Subject: Re: [patch] Improve prefetch heuristics
- References: <20100430010543.GA30055@kam.mff.cuni.cz> <1C13CD442679CE45A2E80AE9251D7EF921803A2E@SAUSEXMBP01.amd.com>
Hi,
> > >> You are right. It looks prune_ref_by_self_reuse has already adjusted the prefetch distance
> > >> through prefetch_mod. The reason I observed the short prefetch distance may due to the induction variable
> > >> "ap" in the issue_prefetch_ref logic:
> > >>
> > >> for (ap = 0; ap < n_prefetches; ap++) /* <---------------------------- */
> > >> {
> > >> /* Determine the address to prefetch. */
> > >> delta = (ahead + ap * ref->prefetch_mod) * ref->group->step;
> > >>
> > >> When ap equals 0, the prune_self_reuse adjustment is essentially ignored. Applying the following patch
> > >> can resolve the short prefetch distance problem:
> > >>
> > >> - for (ap = 0; ap < n_prefetches; ap++)
> > >> + for (ap = 1; ap <= n_prefetches; ap++)
> > >> {
> > >> /* Determine the address to prefetch. */
> > >> delta = (ahead + ap * ref->prefetch_mod) * ref->group->step;
> >
> > >I think there is some missunderstanding. The statement "When ap equals 0, the prune_self_reuse adjustment
> > >is essentially ignored." does not make sense to me. Also, the change you propose only increases the prefetch
> > >distance by a constant offset.
> >>
> >> In my experiemnts, n_prefetches always equals to 1, as a result, ap * ref->prefetch_mod == 0. This is what I meant
> >> prefetch_mod takes no effect, and why I observed such short prefetch distance. (delta < L1_CACHE_LINE_SIZE).
>
> >then the problem is with determining `ahead',
>
>
> I use the following example to defend my original patch --0001-Do-not-insert-prefetches-if-they-would-hit-the-same-.patch.
> I think the current prefetch pass will generate prefetches that fall on the same cache line with a existing memory reference.
> The case is from 416.gamess:
>
> ahead 1, prefetch_mod 16, step 4 , unroll factor 1, trip count 1001, insn count 1817, mem ref count 11, prefetch count 7
>
> The case shows a loop with big body (1817 instructions). For loop prefetch, the prefetch ahead for such big loop could
> only be 1 (other value will lead to the prefetch data arriving the cache too earlier, and may evict useful data out of
> the cache)
>
> The loop is not going to be unrolled, and the step size is 4..
> delta = (ahead + ap * ref->prefetch_mod) * ref->group->step
> = 4 when ap=0
>
> I could not see how can we generate effective prefetches for loops with big body, and small step size.
yes, without unrolling, the prefetches will necessarily overlap in such a case.
It is not quite clear that this is bad (it may well be that having 16 prefetch
instructions is cheaper than having a cache miss every 16 iterations), but it
is certainly plausible that it would be better to avoid this.
Still, your patch does not really check for this situation. A better way to handle this
would be to have a check of form
if (prefetch_mod / unroll_factor > MAGIC_CONSTANT)
continue;
in schedule_prefetches.
Zdenek