This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch] Improve prefetch heuristics


Hi,

> > >> You are right.  It looks prune_ref_by_self_reuse has already adjusted the prefetch distance
> > >> through prefetch_mod. The reason I observed the short prefetch distance may due to the induction variable
> > >> "ap" in the issue_prefetch_ref logic:
> > >>
> > >>    for (ap = 0; ap < n_prefetches; ap++)    /* <---------------------------- */
> > >>      {
> > >>        /* Determine the address to prefetch.  */
> > >>        delta = (ahead + ap * ref->prefetch_mod) * ref->group->step;
> > >>
> > >> When ap equals 0, the prune_self_reuse adjustment is essentially ignored.  Applying the following patch
> > >> can resolve the short prefetch distance problem:
> > >>
> > >>  -  for (ap = 0; ap < n_prefetches; ap++)
> > >>  + for (ap = 1; ap <= n_prefetches; ap++)
> > >>    {
> > >>        /* Determine the address to prefetch.  */
> > >>        delta = (ahead + ap * ref->prefetch_mod) * ref->group->step;
> >
> > >I think there is some missunderstanding.  The statement "When ap equals 0, the prune_self_reuse adjustment
> > >is essentially ignored." does not make sense to me.  Also, the change you propose only increases the prefetch
> > >distance by a constant offset.
> >>
> >> In my experiemnts, n_prefetches always equals to 1, as a result,  ap * ref->prefetch_mod == 0. This is what I meant
> >> prefetch_mod takes no effect, and why I observed such short prefetch distance. (delta < L1_CACHE_LINE_SIZE).
> 
> >then the problem is with determining `ahead',
> 
> 
> I use the following example to defend my original patch --0001-Do-not-insert-prefetches-if-they-would-hit-the-same-.patch.
> I think the current prefetch pass will generate prefetches that fall on the same cache line with a existing memory reference.
> The case is from 416.gamess:
> 
> ahead 1, prefetch_mod 16,  step 4 , unroll factor 1, trip count 1001, insn count 1817, mem ref count 11, prefetch count 7
> 
> The case shows a loop with big body (1817 instructions). For loop prefetch, the prefetch ahead for such big loop could
> only be 1 (other value will lead to the prefetch data arriving the cache too earlier, and may evict useful data out of 
> the cache)
> 
> The loop is not going to be unrolled, and the step size is 4..
>  delta = (ahead + ap * ref->prefetch_mod) * ref->group->step 
>           = 4 when ap=0
> 
> I could not see how can we generate effective prefetches for loops with big body, and small step size. 

yes, without unrolling, the prefetches will necessarily overlap in such a case.
It is not quite clear that this is bad (it may well be that having 16 prefetch
instructions is cheaper than having a cache miss every 16 iterations), but it
is certainly plausible that it would be better to avoid this.

Still, your patch does not really check for this situation.  A better way to handle this
would be to have a check of form 

if (prefetch_mod / unroll_factor > MAGIC_CONSTANT)
  continue;

in schedule_prefetches.

Zdenek


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]