This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Patch to Avoid Bad Prefetching
- From: Steven Bosscher <stevenb dot gcc at gmail dot com>
- To: "Shobaki, Ghassan" <Ghassan dot Shobaki at amd dot com>
- Cc: Richard Guenther <richard dot guenther at gmail dot com>, Zdenek Dvorak <rakdver at kam dot mff dot cuni dot cz>, gcc-patches at gcc dot gnu dot org
- Date: Wed, 3 Jun 2009 09:11:54 +0200
- Subject: Re: Patch to Avoid Bad Prefetching
- References: <84fc9c000904150216h170269b7v70f5ae2a10110296@mail.gmail.com> <20090415155239.GA20198@kam.mff.cuni.cz> <D60441ACF713E84E9952ADA42F2E08B70149981A@ssvlexmb2.amd.com> <20090416010249.GB21250@kam.mff.cuni.cz> <D60441ACF713E84E9952ADA42F2E08B70149988B@ssvlexmb2.amd.com> <20090416052522.GB3021@kam.mff.cuni.cz> <D60441ACF713E84E9952ADA42F2E08B701499986@ssvlexmb2.amd.com> <20090416172534.GB19702@kam.mff.cuni.cz> <84fc9c000904240451g28482e85mcd202d894f29b3f@mail.gmail.com> <912DA18E911D8B418824641EF1541F3C417E84@ssvlexmb2.amd.com>
On Wed, Jun 3, 2009 at 2:19 AM, Shobaki, Ghassan
<Ghassan.Shobaki@amd.com> wrote:
> First Heuristic: Disable prefetching in a loop if the
> potential benefit is insignificant. Prefetching improves
> performance by overlapping cache missing memory operations
> with CPU operations. Therefore, if the loop does not have
> a significant amount of CPU operations for the machine to
> execute while waiting on cache misses, the gain from
> prefetching will be insignificant and hence unlikely to
> pay for the prefetching cost. To be precise, an upper bound
> on the benefit from prefetching can be computed by
> estimating the time needed to execute the CPU operations
> and dividing that by the time needed to execute the entire
> loop (with cache misses taken into account). However, this
> patch avoids these instruction-by-instruction calculations
> and adopts an approximation that simply looks at the ratio
> between the total instruction count and the memory reference
> count and disables prefetching if that ratio is less than a
> certain threshold (PREFETCH_MIN_INSN_TO_MEM_RATIO).
Is loop blocking already implemented in Graphite? If so, it would be
interesting to see if you can use loop blocking with a prefetch in one
of the outer loops, e.g.:
! Original loop
REAL A(N,M), B(N,M)
DO J =1, M
DO I = 1, N
A(I,J) = A(I,J) + B(I,J)
ENDDO
ENDDO
! Transformed loop after blocking and inserting prefetches
REAL A(N,M), B(N,M)
DO J = 1, M, BS
DO I =1, N, BS
PREFETCH(A(I,J+BS)) ! Or something like this, don't...
PREFETCH(B(I,J+BS)) ! ...mind the details ;-)
DO JJ = J, J+M, BS-1
DO II = I, I+N, BS-1
A(II,JJ) = A(II,JJ) + B(II,JJ)
ENDDO
ENDDO
ENDDO
ENDDO
This way, you can raise the PREFETCH_MIN_INSN_TO_MEM_RATIO for the
inner two loops.
Ciao!
Steven