This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

[PATCH] new page for Projects list: data prefetch support


I've gathered together information about data prefetch instructions on
several GCC targets and put that, plus descriptions of prefetch concepts
and some general wisdom about compiler use of data prefetch, into a new
page for the Projects list.  Gerald Pfeifer pre-approved it, so I plan
to check it in later today.

I expect this document to change a lot in the next couple of weeks as I
dig up more information for it.  Any pointers to additional targets, or
to documentation about targets whose descriptions are incomplete, will
be most welcome.

There is no ChangeLog for wwwdocs, but if there were this is what the
entry for this patch would look like:

2001-11-06  Janis Johnson  <janis187@us.ibm.com>

	* index.html: Link to a new page for data prefetch support.
	* prefetch.html: New.

Index: index.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/index.html,v
retrieving revision 1.21
diff -u -r1.21 index.html
--- index.html	2001/10/02 07:23:36	1.21
+++ index.html	2001/11/06 22:36:28
@@ -19,6 +19,7 @@
 <li><a href="#value_range_propagation_pass">Value range propagation pass</a>
 <li><a href="#automaton_based_pipeline_hazard_recognizer">Automaton based pipeline hazard recognizer</a>
 <li><a href="#ast_optimizer">Tree optimization passes</a>
+<li><a href="#data_prefetch">Data prefetch support</a>
 </ul>
 <li><a href="#optimizer_inadequacies">Optimizer inadequacies</a>
 <li><a href="#ia64 projects">Projects to improve performance on IA-64</a>
@@ -133,6 +134,11 @@
 <h3><a name="ast_optimizer">Tree optimization passes</a></h3>
 A separate page describes <a href="ast-optimizer.html">abstract
 syntax tree optimization passes</a> that are being worked on.</p>
+
+<h3><a name="data_prefetch">Data prefetch support</a></h3>
+<p>A separate page describes <a href="prefetch.html">
+data prefetch support and optimizations</a> that are in development
+in the main branch.</p>
   
 <h2><a name="optimizer_inadequacies">Optimizer inadequacies</a></h2>
 <p>We also have a page detailing <a href="optimize.html">optimizer
--- /dev/null	Tue May 23 09:27:54 2000
+++ prefetch.html	Tue Nov  6 14:38:55 2001
@@ -0,0 +1,814 @@
+<html>
+
+<head>
+<title>Data Prefetch Support</title>
+</head>
+
+<h1 align="center">Data Prefetch Support</h1>
+
+<h2><a name="toc">Table of Contents</a></h2>
+<ul>
+<li><a href="#intro">Introduction</a>
+<li><a href="#elements">Elements of Data Prefetch Support</a>
+  <ul>
+  <li><a href="#locality">Locality</a>
+  <li><a href="#write">Read or Write Access</a>
+  <li><a href="#size">Size of block to access</a>
+  <li><a href="#base_update">Base update</a>
+  <li><a href="#misc">Miscellaneous Features</a>
+  </ul>
+<li><a href="#rules">Guidelines for Prefetching Data</a>
+<li><a href="#targets">Data Prefetch Support on GCC Targets</a>
+  <ul>
+  <li><a href="#summary">Summary</a>
+  <li><a href="#3dnow">3DNow!</a>
+  <li><a href="#alpha">Alpha 21264</a>
+  <li><a href="#altivec">AltiVec</a>
+  <li><a href="#ia32_sse">IA-32 SSE</a>
+  <li><a href="#ia64">Itanium</a>
+  <li><a href="#mips">MIPS</a>
+  <li><a href="#mmix">MMIX</a>
+  <li><a href="#hppa">PA-RISC</a>
+  <li><a href="#powerpc">PowerPC</a>
+  <li><a href="#sh_34">SH</a>
+  <li><a href="#sparc">SPARC</a>
+  <li><a href="#xscale">XScale</a>
+  </ul>
+<li><a href="#refs">References</a>
+</ul>
+
+<h2><a name="intro">Introduction</a></h2>
+
+<p>The framework for data prefetch in GCC will support capabilities
+of a variety of targets.  Optimizations within GCC that involve prefetching
+data can pass relevant information to the target-specific prefetch
+support, which can either take advantage of it or ignore it. The
+information here about data prefetch support in GCC targets was
+gathered as input for determining the operands to GCC's
+<code>prefetch</code> RTL pattern.</p>
+
+<p>The following data prefetch projects are currently planned:
+<ul>
+<li>Janis Johnson is defining a prefetch RTL pattern and adding support
+for it for ia64 and variants of i386.
+<li>Janis Johnson will implement a generic <code>__builtin_prefetch</code>,
+which will do nothing on targets that do not support prefetch or for
+which prefetch support has not yet been added to GCC.
+<li>Jan Hubicka plans to update his work to prefetch arrays in loops,
+for which he submitted a preliminary patch in May 2000.  This optimization
+will be controlled by an option, perhaps called
+<code>-fprefetch-array-loops</code>.
+<li>Jan Hubicka, perhaps with help from Janis Johnson, plans to support
+greedy prefetch of data referenced by pointer variables.  This will be
+controlled by an option, perhaps called <code>-fprefetch-pointers</code>.
+</ul>
+
+<p>Possibilities for other work include:
+<ul>
+<li>Prefetch support for additional targets, patterned after the support
+for ia64 and i386.
+<li>Running benchmarks and analyzing results on various targets to validate
+prefetch optimization heuristics.
+<li>Using profile information to guide prefetching of data.
+<li>Other optimizations.
+</ul>
+
+<p>This document is a work in progress.  Please copy any comments about
+it to <a href="mailto:janis187@us.ibm.com";>Janis Johnson,
+&lt;janis187@us.ibm.com&gt;</a>.
+
+<h2><a name="elements">Elements of Data Prefetch Support</a></h2>
+
+<p>Data prefetch, or cache management, instructions allow a compiler
+or an assembly language programmer to minimize cache-miss latency
+by moving data into a cache before it it accessed.
+Data prefetch instructions are generally treated as hints;
+they affect the performance but not the functionality of software in
+which they are used. There are some prefetch instructions that cause
+faults when the address to prefetch is invalid or not cacheable, but
+those instructions are not covered here.</p>
+
+<h3><a name="locality">Locality</a></h3>
+
+<p>Data prefetch instructions often include information about the
+<em>locality</em> of expected accesses to prefetched memory.  Such
+hints can be used by the implementation to move the data into the
+cache level where it will be the most good, or the least harm.
+Prefetched data in the same cache line as other data likely to be
+accessed soon, such as neighboring array elements, has
+<em>spatial locality</em>.
+Data with <em>temporal locality</em>, or <em>persistence</em>, is expected
+to be accessed multiple times and so should be left in a cache when it is
+prefetched so it will continue to be readily accessible.
+Accesses to data with no temporal locality are <em>transient</em>; the data
+is unlikely to be accessed multiple times and, if possible, should not be
+left in a cache where it would displace other data that might be needed soon.
+
+<p>Some data prefetch instructions allow specifying in which level of
+the cache the data should be left.</p>
+
+<p>Locality hints determined in GCC optimization passes can be ignored in
+the machine description for targets that do not support them.</p>
+
+<h3><a name="write">Read or Write Access</a></h3>
+
+<p>Some data prefetch instructions make a distinction between memory
+which is expected to be read and memory which is expected to be written.
+When data is to be written, a prefetch instruction can move a block
+into the cache so that the expected store will be to the cache.
+Prefetch for write generally brings the data into the cache in an
+exclusive or modified state.
+</p>
+
+<p>A prefetch for data to be written can usually be replaced with a
+prefetch for data to be read; this is what happens on implementations
+that define both kinds of instructions but do not support prefetch for
+writes.</p>
+
+<h3><a name="size">Size of block to access</a></h3>
+
+<p>The amount of data accessed by a data prefetch instruction is
+usually a cache line, whose size is usually implementation specific,
+but is sometimes a specified number of bytes.</p>
+
+<h3><a name="base_update">Base update</a></h3>
+
+<p>At least one target's data prefetch instructions has a
+<em>base update</em> form, which modifies the prefetch address after
+the prefetch.  Base update, or pre/post increment, is also supported
+on load and store instructions for some targets, and this could be
+taken into consideration in code that uses data prefetch.</p>
+
+<h3><a name="misc">Miscellaneous Features</a></h3>
+
+<p>Some prefetch instructions have requirements about address alignment.
+These can be handled in the machine description; optimization passes
+do not need to know about them.</p>
+
+<p>Optimizations will need information about various implementation
+dependent parameters of data prefetch support, including:</p>
+<ul>
+<li>number of simultaneous prefetch operations
+<li>number of bytes prefetched
+</ul>
+
+<h2><a name="rules">Guidelines for Prefetching Data</a></h2>
+
+<p>Prefetch timing is important.  The data should be in the cache
+by the time it is accessed, but without a delay that would allow
+other data to displace it before it is used.
+</p>
+
+<p>Using prefetches that are too speculative can have negative effects,
+because there are costs associated with data prefetch instructions.
+These include wasting bandwidth, kicking other data out of the cache and
+causing additional conflict misses, and increasing code size, which
+can bump useful instructions out of the instruction cache.
+</p>
+
+<p>Similarly, prefetching data that is already in the cache increases
+overhead without providing any benefit.  Data might already be in the
+cache if it is in the same cache line as data already prefetched
+(spatial locality), or if the data has been used recently (temporal
+locality).
+</p>
+
+<p>On some (but not all) targets it makes sense to combine prefetching
+arrays in loops with loop unrolling.</p>
+
+<h2><a name="targets">Data Prefetch Support on GCC Targets</a></h2>
+
+<p>Variants of prefetch commands that fault are not included here.
+Some implementations of these architectures recognize data prefetch
+instructions but treat them as <code>nop</code> instructions.
+There are generally ignored for pages that are not cacheable.
+The exception to this is prefetch instructions with base update forms,
+for which the base address is updated even if the addressed memory
+cannot be prefetched.</p>
+
+<p>The descriptions that follow are meant to describe the basic
+functionality of data prefetch instructions.  For complete information
+about data prefetch support on a particular processor, refer to the
+technical documentation for that processor; the references provide a
+starting point for that information.</p>
+
+<h3><a name="summary">Summary</a></h3>
+
+<table border=1 cellspacing=0 cellpadding=5>
+<tr>
+  <th>Target</th>
+  <th>Prefetch amount</th>
+  <th>Read/write</th>
+  <th>Locality hints</th>
+  <th>Other features to consider</th>
+</tr>
+<tr>
+  <td>3DNow!</td>
+  <td>cache line; at least 32 bytes</td>
+  <td>yes</td>
+  <td>&nbsp;</td>
+  <td>&nbsp;</td>
+</tr>
+<tr>
+  <td>Alpha 21264</td>
+  <td>cache line</td>
+  <td>yes</td>
+  <td>separate instruction for transient loads</td>
+  <td>&nbsp;</td>
+</tr>
+<tr>
+  <td>AltiVec</td>
+  <td>specified unit size, count, stride</td>
+  <td>yes</td>
+  <td>temporal locality</td>
+  <td>prefetch instruction must specify one of four touch streams</td>
+</tr>
+<tr>
+  <td>IA-32 SSE</td>
+  <td>cache line; at least 32 bytes</td>
+  <td>no</td>
+  <td>temporal locality and cache level</td>
+  <td>&nbsp;</td>
+</tr>
+<tr>
+  <td>Itanium</td>
+  <td>cache line; at least 32 bytes</td>
+  <td>yes</td>
+  <td>temporal locality and cache level</td>
+  <td>base update form with implicit prefetch; cache control hints on
+      load and store instructions</td>
+</tr>
+<tr>
+  <td>MIPS</td>
+  <td>cache line</td>
+  <td>yes</td>
+  <td>temporal locality (streamed or retained)</td>
+  <td>&nbsp;</td>
+</tr>
+<tr>
+  <td>MMIX</td>
+  <td>specified number of bytes</td>
+  <td>yes</td>
+  <td>&nbsp;</td>
+  <td>&nbsp;</td>
+</tr>
+<tr>
+  <td>PA-RISC</td>
+  <td>cache line</td>
+  <td>yes</td>
+  <td>spatial locality</td>
+  <td>cache control hints on load and store instructions; pre/post increment
+      (base update) forms of some load and store instructions</td>
+</tr>
+<tr>
+  <td>PowerPC</td>
+  <td>cache line</td>
+  <td>yes</td>
+  <td>&nbsp;</td>
+  <td>&nbsp;</td>
+</tr>
+<tr>
+  <td>SH</td>
+  <td>cache line; 16 bytes for SH-3, 32 bytes for SH-4</td>
+  <td>no</td>
+  <td>&nbsp;</td>
+  <td>&nbsp;</td>
+</tr>
+<tr>
+  <td>SPARC</td>
+  <td>cache line</td>
+  <td>yes</td>
+  <td>maybe; what does "one or several" mean?</td>
+  <td>&nbsp;</td>
+</tr>
+<tr>
+  <td>XScale</td>
+  <td>cache line; 32 bytes</td>
+  <td>no</td>
+  <td>&nbsp;</td>
+  <td>&nbsp;</td>
+</tr>
+</table>
+
+<h3><a name="3dnow">3DNow!</a></h3>
+
+<p>The 3DNow! technology from AMD extends the x86 instruction set, primarily
+to support floating point computations.  Processors that support this
+technology include Athlon, K6-2, and K6-III.</p>
+
+<p>The instructions <code>PREFETCH</code> and <code>PREFETCHW</code>
+prefetch a processor cache line into the L1 data cache.
+The first prepares for a read of the data, and the second prepares
+for a write.</p>
+
+<p>There are no alignment restrictions on the address.  The size of the
+fetched line is implementation dependent, but at least 32 bytes.</p>
+
+<p>The Athlon processor supports <code>PREFETCHW</code>, but the K6-2 and
+K6-III processors treat it the same as <code>PREFETCH</code>.
+Future AMD K86 processors might extend the <code>PREFETCH</code>
+instruction format.<p>
+
+<h3><a name=alpha">Alpha 21264</a></h3>
+
+<p>Load instructions with a destination of register <code>R31</code>
+or <code>F31</code> prefetch the cache line containing the addressed data.
+Instruction <code>LDS</code> with a destination of register <code>F31</code>
+prefetches for a store.</p>
+
+<table border=1 cellspacing=0 cellpadding=5>
+<tr>
+  <td><code>LDBU</code>, <code>LDF</code>, <code>LDG</code>, <code>LDL</code>,
+      <code>LDT</code>, <code>LDWU</code></td>
+  <td>Normal cache line prefetches.</td>
+</tr>
+<tr>
+  <td><code>LDS</code></td>
+  <td>Prefetch with modify intent.</td>
+</tr>
+<tr>
+  <td><code>LDQ</code></td>
+  <td>Prefetch, evict next; no temporal locality.</td>
+</tr>
+</table>
+
+<p>NOTE: Table A-2, <em>Architecture Instructions</em>, 
+in Appendix A, <em>Alpha Instruction Set</em> of [2]
+includes instructions <code>FETCH</code> (prefetch data) and
+<code>FETCH_M</code> (prefetch data, modify intent).
+Where are these documented?
+</p>
+
+<h3><a name="altivec">AltiVec</a></h3>
+
+<p>AltiVec has prefetch instructions for use with regular (non-vector) code[3].
+These are the instructions:</p>
+
+<table border=1 cellspacing=0 cellpadding=5>
+<tr>
+  <td><code>dst</code></td>
+  <td>(Data Stream Touch); data marked as most recently used
+      (temporal locality</td>
+</tr>
+<tr>
+  <td><code>dstst</code></td>
+  <td>(Data Stream Touch for Store); data marked as most recently used
+      (temporal locality)</td>
+</tr>
+<tr>
+  <td><code>dstt</code></td>
+  <td>(Data Stream Touch Transient); data marked as least recently used
+      (no temporal locality)</td>
+</tr>
+<tr>
+  <td><code>dststt</code></td>
+  <td>(Data Stream Touch Transient for Store); data marked as least
+      recently used (no temporal locality)</td>
+</tr>
+</table>
+
+<p>These instructions all operate on a <em>data stream</em>, which
+consists of:</p>
+
+<table border=1 cellspacing=0 cellpadding=5>
+<tr>
+  <td>EA</td>
+  <td>the effective address of the first unit in the sequence;
+      there are no alignment restrictions</td>
+</tr>
+<tr>
+  <td>unit size</td>
+  <td>the number of quad words <em>(16 bytes?)</em>
+      in each unit; between 0 and 31</td>
+</tr>
+<tr>
+  <td>count</td>
+  <td>the number of units in the sequence; between 0 and 255</td>
+</tr>
+<tr>
+  <td>stride</td>
+  <td>the number of bytes between the effective address of one unit
+      and the effective address of the next unit in the sequence; this can be
+      negative, but should not be smaller than 16 bytes
+      <em>(does this imply that this is a base update form?)</em></td>
+</tr>
+</table>
+
+<p>A prefetch instruction specifies one of four touch streams, each of
+which can prefetch up to 128K bytes, 12K bytes in a contiguous block.
+If GCC supports prefetch on AltiVec, it will need to keep track of which
+touch streams are in use.</p>
+
+<p>The instructions <code>lvxl</code> (Load Vector Indexed LRU) and
+<code>stvxl</code> (Store Vector Indexed LRU) indicate that an access
+is likely to be the final one to a cache block and that the address
+should be treated as least recently used, to allow other data to
+replace it in the cache.
+
+<p>The differences between AltiVec's cache control instructions and 
+The PowerPC instructions <code>dcbt</code> and <code>dcbtst</code> are
+discussed in section 5.2.1.7 of [3].</p>
+
+<h3><a name="ia32_sse">IA-32 SSE</a></h3>
+
+<p>The IA-32 Streaming SIMD Extension (SSE) instructions are used on several
+platforms, including the Pentium III and IA-32 support on Itanium.
+The SSE prefetch instructions are included in
+the AMD extensions to 3DNow! and MMX used for x86-64.
+
+<p>The SSE <code>prefetch</code> instruction has the following variants:
+
+<table border=1 cellspacing=0 cellpadding=5>
+<tr>
+  <td><code>prefetcht0</code></td>
+  <td>Temporal data; prefetch data into all cache levels.</td>
+</tr>
+<tr>
+  <td><code>prefetcht1</code></td>
+  <td>Temporal with respect to first level cache;
+      prefetch data in all cache levels except 0th cache level.</td>
+</tr>
+<tr>
+  <td><code>prefetcht2</code></td>
+  <td>Temporal with respect to second level cache; prefetch data in
+      all cache levels, except 0th and 1st cache levels.</td>
+</tr>
+<tr>
+  <td><code>prefetchnta</code></td>
+  <td>Non-temporal with respect to all cache levels; prefetch data into
+      non-temporal cache structure, with minimal cache pollution.</td>
+</tr>
+</table>
+
+<p>There are no alignment requirements for the address.  The size of the
+line prefetched is implementation dependent, but a minimum of 32 bytes.</p>
+
+<h3><a name="ia64">Itanium</a></h3>
+
+The <code>lfetch</code> (Line Prefetch) instruction has versions for
+read and write prefetches, and an optional modifier to specify the
+locality of the memory access and the cache level to which the data
+would best be allocated.</p>
+
+<p>The possible values for the locality hint are:</p>
+
+<table border=1 cellspacing=0 cellpadding=5>
+<tr>
+  <td>none</td>
+  <td>Temporal locality for cache level 1 and higher (all levels).</td>
+</tr>
+<tr>
+  <td><code>nt1</code></td>
+  <td>No temporal locality for level 1, temporal for level 2 and higher.</td>
+</tr>
+<tr>
+  <td><code>nt2</code></td>
+  <td>No temporal locality for level 2, temporal for levels above 2.</td>
+</tr>
+<tr>
+  <td><code>nta</code></td>
+  <td>No temporal locality, all levels</td>
+</tr>
+</table>
+
+<p>There are two base update forms of <code>lfetch</code>, which increment
+the register containing the address and then implicitly prefetch the new
+address, as well as the original address.  The increment value is either
+in a second general register or is an immediate value.</p>
+
+<p>Line size is implementation dependent; it is a power of 2, at
+least 32.</p>
+
+<p>Load and store instructions can also be used to prefetch data.
+The base update forms of these instructions imply a prefetch, and
+have a completer that specifies the locality of the memory access.</p>
+
+<h3><a name="mips">MIPS</a></h3>
+
+<p>The <code>PREF</code> (Prefetch) instruction, supported by MIPS32
+and MIPS64, takes a hint with one of the following values:</p>
+
+<table border=1 cellspacing=0 cellpadding=5>
+<tr>
+  <td><code>load</code></td>
+  <td>data is expected to be read, not modified</td>
+</tr>
+<tr>
+  <td><code>store</code></td>
+  <td>data is expected to be stored or modified</td>
+</tr>
+<tr>
+  <td><code>load_streamed</code></td>
+  <td>data is expected to be read but not reused</td>
+</tr>
+<tr>
+  <td><code>store_streamed</code></td>
+  <td>data is expected to be stored but not reused</td>
+</tr>
+<tr>
+  <td><code>load_retained</code></td>
+  <td>data is expected to be read and reused extensively</td>
+</tr>
+<tr>
+  <td><code>store_retained</code></td>
+  <td>data is expected to be stored and reused extensively</td>
+</tr>
+<tr>
+  <td><code>writeback_invalidate</code></td>
+  <td>data is no longer expected to be used</td>
+</tr>
+<tr>
+  <td><code>PrepareForStore</code></td>
+  <td>prepare the cache for writing an entire line</td>
+</tr>
+</table>
+
+<p>The "streamed" versions place the prefetched data into the cache in
+such a way that it will not displace data prefetched as "retained".
+The "retained" versions place the data in the cache so that it will not
+be displaced by data prefetched as "streamed."</p>
+
+<p>The prefetch moves a block of data into the cache.  The size is
+implementation specific.
+
+<p>There are no alignment restrictions.</p>
+
+<p>The <code>PREFX</code> (Prefetch Indexed) instruction, supported by MIPS64,
+differs in the addressing mode and is for use with floating point data.</p>
+
+<h3><a name="mmix">MMIX</a></h3>
+
+MMIX has the following data prefetch instructions:
+
+<table border=1 cellspacing=0 cellpadding=5>
+<tr>
+  <td><code>PRELD</code></td>
+  <td>preload a specified number of bytes of data</td>
+</tr>
+<tr>
+  <td><code>PRELDI</code></td>
+  <td>preload data immediate</td>
+</tr>
+<tr>
+  <td><code>PREST</code></td>
+  <td>prestore (prefetch for write) a specified number of bytes of data</td>
+</tr>
+<tr>
+  <td><code>PRESTI</code></td>
+  <td>prestore data immediate</td>
+</tr>
+</table>
+
+<p>There are also load and store instructions, <code>LDUNC</code> and
+<code>STUNC</code>, which request that the data not be cached because
+it is unlikely to be accessed again soon.</p>
+
+<h3><a name="hppa">PA-RISC</a></h3>
+
+<p>A normal load to register <code>GR0</code> prefetches data.
+The data prefetch instructions are:
+<table border=1 cellspacing=0 cellpadding=5>
+<tr><td><code>LDW</code></td><td>Prefetch cache line for read.</td></tr>
+<tr><td><code>LDD</code></td><td>Prefetch cache line for write.</td></tr>
+</table
+
+<p>Prefetch and cache control are also supported for accesses of semaphores.
+</p>
+
+<p>Some load and store instructions modify the base register, providing
+either pre-increment or post-increment, and some provide a cache control
+hint; a load instruction can specify spatial locality, and
+a store instruction can specify block copy or spatial locality.
+The spatial locality hint implies that there is poor temporal locality
+and that the prefetch should not displace existing data in the cache.
+The block copy hint indicates that the program is likely to store a
+full cache line of data.</p>
+
+<p>There are no alignment requirements on the address of prefetched data;
+the low order part of the address is ignored.
+</p>
+
+<h3><a name="powerpc">PowerPC</a></h3>
+
+<p>The instruction <code>tcbt</code> (Data Cache Block Touch) is used for
+an address expected to be used by a load, and <code>dcbtst</code> (Data
+Cache Block Touch for Store) is used for an address expected to be used
+for a store.</p>
+
+<p>There are no alignment restrictions on the address of the data to
+prefetch.</p>
+
+<h3><a name="sh_34">SH</a></h3>
+
+<p>The SH-3 and SH-4 processors provide the <code>PREF</code> (Prefetch
+Data to the Cache) instruction.</p>
+
+For the SH-3, the address should be on a longword
+<em>(how many bytes is that?)</em>
+boundary.  The number of bytes prefetched is 16.[14]
+For the SH-4, the instruction moves 32 bytes of data starting at a 32-byte
+boundary into the operand cache.[15]</p>
+
+<h3><a name="sparc">SPARC</a></h3>
+
+<p>SPARC v9 supports the <code>PREFETCH</code> (Prefetch Data) and
+<code>PREFETCHA</code> (Prefetch Data from Alternate Space)
+instructions[16,17,18], which have the following variants:</p>
+
+<ul>
+<li>prefetch for several reads
+<li>prefetch for several writes
+<li>prefetch for one read
+<li>prefetch for one write
+<li>prefetch page
+</ul>
+
+<p>Of these, UltraSPARC-II supports only the first two, and UltraSPARC-IIi
+supports them all.
+UltraSPARC-I does not implement these instructions.</p>
+
+<p>There are no alignment restrictions on the address to prefetch.</p>
+
+<p><em>QUESTION:</em>
+What does it mean to prefetch for several reads or writes?
+Does that specify how many bytes to prefetch, or is it a locality hint?
+Does it mean several reads from the same address (temporal locality) or
+reads of several consecutive units (spatial locality)?
+I need to see more complete documentation of these instructions,
+probably Appendix A of <em>Sparc Architecture Manual, Version 9</em>.</p>
+
+<h3><a name="xscale">XScale</a></h3>
+
+<p>The Intel XScale processor includes ARM's DSP-enhanced instructions,
+including the <code>PLD</code> (Preload) instruction.
+This instruction prefetches the 32-byte cache line that includes
+the specified data address.</p>
+
+<p>NOTE: More investigation is necessary; [23] has an example that
+implies that base update might be available.</p>
+
+<h2><a name="refs">References</a></h2>
+
+<p>These references need cleanup and should actually be used in the text
+above that uses the information.  Many of the links will likely be out
+of date soon, but they'll stay here until the initial rush of prefetch
+work is done.</p>
+
+<p>References to cache control instructions for specific architectures:</p>
+
+<p><a name="ref_1">[1]</a>
+<em>3DNow![tm] Technology Manual</em>, AMD;
+<a href="http://www.amd.com/us-en/assets/content_type_white_papers_and_tech_docs/21928.pdf";>
+http://www.amd.com/us-en/assets/content_type_white_papers_and_tech_docs/21928.pdf</a>.</p>
+
+<p><a name="ref_2">[2]</a>
+<em>Alpha 21264 Hardware Reference Manual</em>, July 1999,
+a large PDF file with a link from
+<a href="http://www.support.compaq.com/alpha-tools/documentation/current/chips-doc.html";>
+http://www.support.compaq.com/alpha-tools/documentation/current/chips-doc.html</a>;
+see section 2.6.</p>
+
+<p><a name="ref_3">[3]</a>
+<em>AltiVec Technology Programming Environment Manual</em>, 11/1998, Rev. 0.1;
+a large PDF file with a link from
+<a href="http://www.altivec.org/tech_specifications/show_techpdf.cfm";>
+http://www.altivec.org/tech_specifications/show_techpdf.cfm</a>.
+Page 5-9 has usage recommendations.</p>
+
+<p><a name="ref_4">[4]</a>
+<em>AMD Extensions to the 3DNow![tm] and MMX[tm] Instruction Sets</em>, AMD,
+Publication 22466D, March 2000;
+<a href="http://www.amd.com/us-en/assets/content_type_white_papers_and_tech_docs/22466.pdf";>
+http://www.amd.com/us-en/assets/content_type_white_papers_and_tech_docs/22466.pdf</a>.</p>
+
+<p><a name="ref_5">[5]</a>
+<em>The IA-32 Intel Architecture Software Developer's Manual, Volume 2:
+Instruction Set Reference</em>;
+a large PDF file with a link from
+<a href="http://developer.intel.com/design/PentiumIII/manuals/";>
+http://developer.intel.com/design/PentiumIII/manuals/</a>.</p>
+
+<p><a name="ref_6">[6]</a>
+<em>Intel Itanium[tm] Architecture Software Developer's Manual Vol. 1
+rev. 1.1: Application Architecture</em>;
+a large PDF file with a link from
+<a href="http://developer.intel.com/design/itanium/manuals/index.htm";>
+http://developer.intel.com/design/itanium/manuals/index.htm</a>.</p>
+
+<p><a name="ref_7">[7]</a>
+<em>Intel Itanium[tm] Architecture Software Developer's Manual Vol. 3
+rev. 1.1: Instruction Set Reference</em>;
+a large PDF file with a link from
+<a href="http://developer.intel.com/design/itanium/manuals/index.htm";>
+http://developer.intel.com/design/itanium/manuals/index.htm</a>.</p>
+
+<p><a name="ref_8">[8]</a>
+<em>MIPS32[tm] Architecture for Programmers; Volume II: The MIPS32[tm]
+Instruction Set</em>, MIPS Technologies, Document Number MD00086,
+Revision 0.95, March 12, 2001;
+search from <a href="http://www.mips.com";>http://www.mips.com</a>.<p>
+
+<p><a name="ref_9">[9]</a>
+<em>MIPS64[tm] Architecture for Programmers; Volume II: The MIPS64[tm]
+Instruction Set</em>, MIPS Technologies, Document Number MD00087,
+Revision 0.95, March 12, 2001;
+search from <a href="http://www.mips.com";>http://www.mips.com</a>.<p>
+
+<p><a name="ref_10">[10]</a>
+<em>MMIX Op Codes</em>, Don Knuth;
+<a href="http://www-cs-faculty.stanford.edu/~knuth/mmop.html";>
+http://www-cs-faculty.stanford.edu/~knuth/mmop.html</a>.<p>
+
+<p><a name="ref_11">[11]</a>
+<em>The Art of Computer Programming, Fascicle 1: MMIX</em>, Don Knuth,
+Addison Wesley Longman, 2001;
+<a href="http://www-cs-faculty.stanford.edu/~knuth/fasc1.ps.gz";>
+http://www-cs-faculty.stanford.edu/~knuth/fasc1.ps.gz</a>.<p>
+
+<p><a name="ref_12">[12]</a>
+<em>PA-RISC Instruction Set Architecture</em>,
+<a href="http://dsportal.eservices.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,959,00.html";>
+http://dsportal.eservices.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,959,00.html";</a>;
+see <em>Memory Reference Instructions</em> in Chapter 6.</p>
+
+<p><a name="ref_13">[13]</a>
+<em>PowerPC Microprocessor 32-bit Family: The Programming Environments</em>,
+page 5-8; a large PostScript file with a link from
+<a href="http://www-3.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_EM603e_Microprocessor";>
+http://www-3.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_EM603e_Microprocessor</a>.</p>
+
+<p><a name="ref_14">[14]</a>
+<em>SuperH[tm] RISC Engine SH-3/SH-3E/SH3-DSP Programming Manual</em>,
+ADE-602-096B, Rev. 3.0, 9/25/00, Hitatchi, Ltd.;
+a large PDF file with a link from
+<a href="http://www.hitachisemiconductor.com/sic/jsp/japan/eng/products/mpumcu/32bit/superh.html";>
+http://www.hitachisemiconductor.com/sic/jsp/japan/eng/products/mpumcu/32bit/superh.html</a>.</p>
+
+<p><a name="ref_15">[15]</a>
+<em>SuperH[tm] RISC Engine SH-4 Programming Manual</em>,
+ADE-602-156D, Rev. 5.0, 4/19/2001, Hitachi, Ltd.;
+a large PDF file with a link from
+<a href="http://www.hitachisemiconductor.com/sic/jsp/japan/eng/products/mpumcu/32bit/superh.html";>
+http://www.hitachisemiconductor.com/sic/jsp/japan/eng/products/mpumcu/32bit/superh.html</a>.</p>
+
+<p><a name="ref_16">[16]</a>
+<em>UltraSPARC[tm] User's Manual</em>,
+Sun Microsystems, Part No: 802-7720-02, July 1997, pages 36-37;
+a large PDF file with a link from
+<a href="http://www.sun.com/microelectronics/manuals/#processors";>
+http://www.sun.com/microelectronics/manuals/#processors</a>.</p>
+
+<p><a name="ref_17">[17]</a>
+<em>UltraSPARC[tm]-II High Performance 64-bit RISC Processor</em>,
+Sun Microelectronics Application Notes,
+<a href="http://www.sun.com/microelectronics/appnotes/802-7254-01/";>
+http://www.sun.com/microelectronics/appnotes/802-7254-01/</a>
+section 5.0: Software Prefetch and Multiple-Outstanding Misses</p>
+
+<p><a name="ref_18">[18]</a>
+<em>UltraSPARC[tm]-IIi User's Manual</em>,
+Sun Microsystems, Part No: 805-0087-01, 1997;
+a large PDF file with a link from
+<a href="http://www.sun.com/microelectronics/manuals/#processors";>
+http://www.sun.com/microelectronics/manuals/#processors</a>.</p>
+
+<p>References to uses of data prefetch instructions:</p>
+
+<p><a name="ref_19">[19]</a>
+<em>Optimizing 3DNow! Real-Time Graphics</em>, Dr. Dobb's Journal August 2000,
+Max I. Fomitchev;
+<a href="http://www.ddj.com/articles/2000/0008/0008c/0008c.htm?topic=graphics";>
+http://www.ddj.com/articles/2000/0008/0008c/0008c.htm?topic=graphics</a>.</p>
+
+<p><a name="ref_20">[20]</a>
+<em>An Overview of the Intel IA-64 Compiler</em>,
+Carole Dulong, Rakesh Krishnaiyer, Dattatraya Kulkarni, Daniel Lavery,
+Wei Li, John Ng, and David Sehr, all of Microcomputer Software Laboratory,
+Intel Corporation, <em>Intel Technology Journal</em>, 4th quarter 1999;
+<a href="http://developer.intel.com/technology/itj/q41999/articles/art_1h.htm";>
+http://developer.intel.com/technology/itj/q41999/articles/art_1h.htm</a></p>
+
+<p><a name="ref_21">[21]</a>
+<em>UltraSPARC[tm]-II Enhancements: Support for Software Controlled
+Prefetch</em>, Sun Microsystems, WPR-0002;
+<a href="http://www.sun.com/microelectronics/whitepapers/wpr-0002/";>
+http://www.sun.com/microelectronics/whitepapers/wpr-0002/</a>.</p>
+
+<p><a name="ref_22">[22]</a>
+<em>Compiler- Based Prefetching for Recursive Data Structures</em>,
+Chi-Keung Luk and Todd C. Mowry, linked from
+<a href="http:/www.cs.cmu.edu/~tcm/Papers.html">
+http:/www.cs.cmu.edu/~tcm/Papers.html</a>.
+That location also has links to several other papers about data prefetch
+by Todd C. Mowry.</p>
+
+<p><a name="ref_23">[23]</a>
+<em>Intel(r) XScale[tm] Core Developer's Manual</em>, December 2000;
+a large PDF file with a link from
+<a href="http://developer.intel.com/design/intelxscale/";>
+http://developer.intel.com/design/intelxscale/</a>;
+click <em>Manual</em>;
+section A.4.4 is "Prefetch Considerations" in the Optimization Guide.</p>
+
+</body>
+</html>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]