This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: prefetch revisited


> In April 2000, Jan Hubicka proposed adding prefetch support to GCC
> (http://gcc.gnu.org/ml/gcc/2000-04/msg00194.html).  This was met with
> much excitement and discussion, and Jan sent a few versions of a
> prefetch patch to gcc-patches.  The discussion died and Jan apparently
> dropped work on the patch, although he added prefetch support for SSE
> and 3dNOW! to config/i386.md.

I didn't dropped it completely and in fact I plan to return to it in resobale
future and I made some code already.  The loop optimizer needs to be revisited
and we do have new loop depdendency code in "depdendency.c" that may be used
for better prefetch used for more selective prefetch generation than my
original code did.  In meantime I made some non-loop prefetching work.
> 
> I'd like to revisit prefetch support in GCC and start by defining an
> infrastructure that can allow various optimizations to eventually take
> advantage of the prefetch capabilities of multiple architectures.  I
> hope to use it for greedy prefetching of memory referenced by pointers,
> as described in the paper "Compiler- Based Prefetching for Recursive
> Data Structures" by Chi-Keung Luk and Todd C. Mowry, available via
> http:/www.cs.cmu.edu/~tcm/Papers.html.  Jan's patch used it in loop

Yes, implementing the greedy prefetching described there should be easy
enought and effective too.  In fact I am having that article on my table
waiting in the "TODO" list :)

> optimizations; that area is apparently undergoing a lot of changes, so
> perhaps the people working on that would like to revisit Jan's prefetch
> work for loops.  In the meantime I'll be using his old loop optimizer
> changes to generate prefetches to let me test the underlying prefetch
> support, with machine-specific support for IA-64 and Pentium III.
> 
> A new prefetch instruction pattern can take an address operand and a
> list of options or flags indicating which kinds of prefetch support to
> use, depending on what the machine supports.  The rtl code for prefetch

As discussed it probably makes sense to introduce new RTL construct
"prefetch" with an memory operand and set of flags to distinguish
various types of prefetches various instructions do have.

> can be recognized throughout the compiler and handled appropriately.  A
> machine description will map the options and flags to the appropriate
> instruction for that machine, ignoring the ones that aren't relevant for
> its prefetch support.  Each architecture will also define a set of
> parameters for prefetching, including the cache line size and the number
> of prefetches that can be done in parallel (as in Jan's patches).
> 
> The earlier discussions mentioned the following machines as supporting
> prefetch: Athlon, ia64, Pentium III, hppa, mips, 3dNOW!, Sparc, PowerPC,
> and Alpha.  Some of the variations of prefetch support that might be
> taken into consideration are read vs. write accesses, base update form,
> spatial and temporal locality, single vs. multiple reads, and multiple
> cache levels; some also support both faulting and non-faulting versions,
> but I assume that we can limit support to non-faulting prefetches.  Are
> there other capabilities of prefetch support to consider?  Which
> prefetch attributes are likely to be useful within GCC?
> 
> Each prefetch optimization can be controlled by a separate flag.  For
> example:
> 
> -fprefetch-loop-arrays
>       If supported for the target machine, generate prefetch
>       instructions to improve the performance of loops that access
>       large arrays.
> 
> -fprefetch-pointers
>       If supported for the target machine, generate prefetch
>       instructions to improve the performance of accesses to recursive
>       data structures.

Mipspro defined more levels of the prefetching code generation, as compiler may
do wonderfully bad job in some examples, but basically I think this approach
makes is correct.

> 
> Am I on the right track?  I'm working on a patch as I figure out how all
> of this stuff works in GCC and I'll be asking for advice on
> implementation details later, but first I'd like to settle the wider

We can discuss it together, as I had also plans concerning this issue..

In fact I have implemented simple greedy prefetching pass that looks for loads
of pointer that are later used as in memory addresses and emits the prefetch
instructions.  This already works relativly well - only problem it has is the
fact that I can't figure out the size of object pointed to and thus I don't
know how much data to fetch, but this should be doable by Richard Kenner's
memory tracking code.

The pass is DU/DF chains based and do use profile information to avoid
unneeded prefetching.

On non-IA-64 macines one can still take a look of offsets used in memory
references to get idea about size.

Honza
> issues.
> 
> Janis


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]