This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [patch] Improve prefetch heuristics


Hi, Zdenek:

> >then the problem is with determining `ahead',
>
>
> I use the following example to defend my original patch --0001-Do-not-insert-prefetches-if-they-would-hit-the-same-.patch.
> I think the current prefetch pass will generate prefetches that fall on the same cache line with a existing memory reference.
> The case is from 416.gamess:
>
> ahead 1, prefetch_mod 16,  step 4 , unroll factor 1, trip count 1001, insn count 1817, mem ref count 11, prefetch count 7
>
> The case shows a loop with big body (1817 instructions). For loop prefetch, the prefetch ahead for such big loop could
> only be 1 (other value will lead to the prefetch data arriving the cache too earlier, and may evict useful data out of
> the cache)
>
> The loop is not going to be unrolled, and the step size is 4..
>  delta = (ahead + ap * ref->prefetch_mod) * ref->group->step
>           = 4 when ap=0
>
> I could not see how can we generate effective prefetches for loops with big body, and small step size.

>yes, without unrolling, the prefetches will necessarily overlap in such a case.
>It is not quite clear that this is bad (it may well be that having 16 prefetch
>instructions is cheaper than having a cache miss every 16 iterations), but it
>is certainly plausible that it would be better to avoid this.

>Still, your patch does not really check for this situation.  A better way to handle this
>would be to have a check of form

>if (prefetch_mod / unroll_factor > MAGIC_CONSTANT)
 > continue;
>in schedule_prefetches.

I think the problem is that the loop body size is too big. In this case, the loop prefetch may not be effective.
The ideal "prefetch distance" should be the prefetch latency. However, if the body size is too big (larger than
the prefetch latency), The prefetch ahead will be less than 1 (rounded to 1 in the current determination of ahead).
I think maybe basic block prefetch is more appropriate because such prefetch require careful scheduling inside
the loop
 
>if (prefetch_mod / unroll_factor > MAGIC_CONSTANT)
 > continue;

Yes, it's a good idea that we add this here in schedule_prefetch. I am going to do some experiments on this. 
I think this MAGIC_CONSTANT should be something like 4 or 8

Thanks,

Changpeng






Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]