This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: adding dependence from prefetch to load
- From: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>
- To: George Caragea <george at cs dot umd dot edu>
- Cc: gcc at gcc dot gnu dot org
- Date: Thu, 12 Apr 2007 17:46:07 +0200
- Subject: Re: adding dependence from prefetch to load
- References: <461D67CE.2080400@cs.umd.edu>
Hello,
> 2. Right now I am inserting a __builting_prefetch(...) call immediately
> before the actual read, getting something like:
> D.1117_12 = &A[D.1101_14];
> __builtin_prefetch (D.1117_12, 0, 1);
> D.1102_16 = A[D.1101_14];
>
> However, if I enable the instruction scheduler pass, it doesn't realize
> there's a dependency between the prefetch and the load, and it actually
> moves the prefetch after the load, rendering it useless. How can I
> instruct the scheduler of this dependence?
>
> My thinking is to also specify a latency for prefetch, so that the
> scheduler will hopefully place the prefetch somewhere earlier in the
> code to partially hide this latency. Do you see anything wrong with this
> approach?
well, it assumes that the scheduler works with long enough lookahead to
actually be able to move the prefetch far enough; i.e., if the
architecture you work with is relatively slow in comparison with the
memory access times, this might be feasible approach. However, on
modern machines, miss in L2 cache may take hundreds of cycles, and it is
not clear to me that scheduler will be able to move the prefetch so far,
or indeed, that it would even be possible (I think often you do not
know the address far enough in advance). Also, prefetching outside of
loops in general appears not to be all that profitable, since usually most of the
time is spent within loops.
So I would recommend first doing some analysis and measurements (say by
introducing the prefetches by hand) to check whether this project really
has potential to lead to significant speedups.
Zdenek