Patch to Avoid Bad Prefetching

Wed Jun 3 07:12:00 GMT 2009

On Wed, Jun 3, 2009 at 2:19 AM, Shobaki, Ghassan
<Ghassan.Shobaki@amd.com> wrote:
> First Heuristic:  Disable prefetching in a loop if the
> potential benefit is insignificant. Prefetching improves
> performance by overlapping cache missing memory operations
> with CPU operations. Therefore, if the loop does not have
> a significant amount of CPU operations for the machine to
> execute while waiting on cache misses, the gain from
> prefetching will be insignificant and hence unlikely to
> pay for the prefetching cost. To be precise, an upper bound
> on the benefit from prefetching can be computed by
> estimating the time needed to execute the CPU operations
> and dividing that by the time needed to execute the entire
> loop (with cache misses taken into account). However, this
> patch avoids these instruction-by-instruction calculations
> and adopts an approximation that simply looks at the ratio
> between the total instruction count and the memory reference
> count and disables prefetching if that ratio is less than a
> certain threshold (PREFETCH_MIN_INSN_TO_MEM_RATIO).

Is loop blocking already implemented in Graphite?  If so, it would be
interesting to see if you can use loop blocking with a prefetch in one
of the outer loops, e.g.:

! Original loop
REAL A(N,M), B(N,M)
DO J =1, M
  DO I = 1, N
    A(I,J) = A(I,J) + B(I,J)
  ENDDO
ENDDO

! Transformed loop after blocking and inserting prefetches

REAL A(N,M), B(N,M)
DO J = 1, M, BS
  DO I =1, N, BS
    PREFETCH(A(I,J+BS))    ! Or something like this, don't...
    PREFETCH(B(I,J+BS))    ! ...mind the details ;-)
    DO JJ = J, J+M, BS-1
      DO II = I, I+N, BS-1
        A(II,JJ) = A(II,JJ) + B(II,JJ)
      ENDDO
    ENDDO
  ENDDO
ENDDO

This way, you can raise the PREFETCH_MIN_INSN_TO_MEM_RATIO for the
inner two loops.

Ciao!
Steven