This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

prefetch revisited


In April 2000, Jan Hubicka proposed adding prefetch support to GCC
(http://gcc.gnu.org/ml/gcc/2000-04/msg00194.html).  This was met with
much excitement and discussion, and Jan sent a few versions of a
prefetch patch to gcc-patches.  The discussion died and Jan apparently
dropped work on the patch, although he added prefetch support for SSE
and 3dNOW! to config/i386.md.

I'd like to revisit prefetch support in GCC and start by defining an
infrastructure that can allow various optimizations to eventually take
advantage of the prefetch capabilities of multiple architectures.  I
hope to use it for greedy prefetching of memory referenced by pointers,
as described in the paper "Compiler- Based Prefetching for Recursive
Data Structures" by Chi-Keung Luk and Todd C. Mowry, available via
http:/www.cs.cmu.edu/~tcm/Papers.html.  Jan's patch used it in loop
optimizations; that area is apparently undergoing a lot of changes, so
perhaps the people working on that would like to revisit Jan's prefetch
work for loops.  In the meantime I'll be using his old loop optimizer
changes to generate prefetches to let me test the underlying prefetch
support, with machine-specific support for IA-64 and Pentium III.

A new prefetch instruction pattern can take an address operand and a
list of options or flags indicating which kinds of prefetch support to
use, depending on what the machine supports.  The rtl code for prefetch
can be recognized throughout the compiler and handled appropriately.  A
machine description will map the options and flags to the appropriate
instruction for that machine, ignoring the ones that aren't relevant for
its prefetch support.  Each architecture will also define a set of
parameters for prefetching, including the cache line size and the number
of prefetches that can be done in parallel (as in Jan's patches).

The earlier discussions mentioned the following machines as supporting
prefetch: Athlon, ia64, Pentium III, hppa, mips, 3dNOW!, Sparc, PowerPC,
and Alpha.  Some of the variations of prefetch support that might be
taken into consideration are read vs. write accesses, base update form,
spatial and temporal locality, single vs. multiple reads, and multiple
cache levels; some also support both faulting and non-faulting versions,
but I assume that we can limit support to non-faulting prefetches.  Are
there other capabilities of prefetch support to consider?  Which
prefetch attributes are likely to be useful within GCC?

Each prefetch optimization can be controlled by a separate flag.  For
example:

-fprefetch-loop-arrays
      If supported for the target machine, generate prefetch
      instructions to improve the performance of loops that access
      large arrays.

-fprefetch-pointers
      If supported for the target machine, generate prefetch
      instructions to improve the performance of accesses to recursive
      data structures.

Am I on the right track?  I'm working on a patch as I figure out how all
of this stuff works in GCC and I'll be asking for advice on
implementation details later, but first I'd like to settle the wider
issues.

Janis


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]