This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [tree-ssa] Minor update for project web page - vectorization


> Could you prepare a patch for that?
Sure. Here it is - a suggestion for a small update to index.html, and the
vectorization html file, both inlined and as an attachment).

thanks,
dorit

Index: index.html
===================================================================
RCS file: /cvsroot/gcc/wwwdocs/htdocs/projects/tree-ssa/index.html,v
retrieving revision 1.21
diff -c -3 -p -r1.21 index.html
*** index.html    9 Sep 2003 16:34:06 -0000     1.21
--- index.html    9 Sep 2003 17:38:10 -0000
*************** different passes are welcome.</p>
*** 411,416 ****
--- 411,419 ----
      <a href="http://icps.u-strasbg.fr/cgi-bin/temporal/temporal.cgi";>
      translator</a>
      for Fortran.</dd>
+ <dt><i><a href="vect_plan.html">Vectorization</a></i></dt>
+ <dd>This pass should vectorize certain loops, using classic data dependence
+ based approach.</dd>
  </dl>

New file: vect_plan.html
==================================================================
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
   <meta name="GENERATOR" content="Mozilla/4.7 [en] (WinNT; I) [Netscape]">
   <title>Vectorization in GCC</title>
</head>
<body text="#000000" bgcolor="#FFCC99" link="#0000EE" vlink="#551A8B" alink="#FF0000">
&nbsp;
<br><font color="#3333FF"><font size=+1>Auto-vectorization in GCC -</font></font>
<br><font color="#3333FF"><font size=+1>Preliminary High level plan of
implementation</font></font>
<p><font color="#000000">This document outlines a high level plan for the
implementation of auto-vectorization in the tree-ssa branch of gcc. The
main goals of this document are to:</font>
<ul>
<li>
<font color="#000000"><b>Facilitate contributing to the different vectorization-related
components.</b> Vectorization consists of many independent components.
While some basic infrastructure will be required by all vectorization "clients",
there are many directions by which this basic infrastructure can be enhanced.
The decision on which directions to pursue first is in many cases guided
by the specific properties of a given target architecture and target application
domain. Therefore, it is important to keep developers informed about the
implementation status of these different directions, and allow them to
push those directions that are more important for their targets (but possibly
too far down the vectorizer's todo list).</font></li>

<li>
<font color="#000000"><b>Increase synchronization and minimize work duplication.</b>
Vectorization consists of, or can benefit from, components that are used
by other compiler optimizations; some of these components are already under
development. The hope is that exposing the list of components that the
vectorizer may need to interact with, would help to increase the synchronization
between the vectorizer and other components and assist in minimizing work
duplication.</font></li>

<li>
<font color="#000000"><b>Expose this work.</b> Allow early exposure of
this work, in order to be able to get feedback. tree-ssa developers are
encouraged to comment on the infrastructure that is available (or is being
or will be developed) in the tree-ssa and it's implications on the items
listed here.</font></li>
</ul>
<font color="#000000">This document tries to serve the above goals by
<b>(a)</b>
providing a list all the issues that vectorization may need to handle,
<b>(b)</b>
breaking the implementation into independent tasks as much as possible,
and <b>(c)</b> proposing a gradual implementation that consists of a very
limited vectorizer as a first step, which will be gradually extend in different
directions.</font>
<p><font color="#009900"><font size=+1>General Vectorization Scheme</font></font>
<p><font color="#000000">The table below outlines the high-level vectorization
scheme along with a proposal for an implementation scheme, as follows:</font>
<ul>
<li>
<font color="#000000">The first column ("<b>vectorization driver</b>")
lists the tasks that the vectorizer must consist of. It briefly describes
the expected functionality of each task.</font></li>

<li>
<font color="#000000">The second column ("<b>basic-vectorizer</b>") describes
a proposal for a basic vectorizer that provides minimal support for each
of these tasks, listing the restrictions that will be imposed by the basic
vectorizer on candidate loops. Loops that are considered vectorizable by
the basic-vectorizer are of the form:</font></li>

<br><font color="#000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<i>for(i=0; i&lt;N; i++){</i></font>
<br><i><font color="#000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
a[i] = b[i] + c[i];</font></i>
<br><i><font color="#000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
}</font></i>
<li>
<font color="#000000">The third column ("<b>enhancements</b>") lists possible
directions for extending the capabilities of the basic vectorizer .</font></li>
</ul>
<font color="#000000">Following the table is a complete list of these enhancement,
that provides some more detail. Some of these enhancements are aimed at
improving the efficiency of the vector code that is being generated. Other
enhancements aim to broaden the range of loops that are amenable for vectorization.
One way to do it is to extend the vectorizer to handle additional loop
forms (for example, extend the vectorizer to handle loops that contain
reduction, type casting, etc.). Another way is to transform loops to confirm
to the set of loops that the vectorizer can handle. For example, for the
case of conditional code - a <b>preprocessing phase</b> will detect 'if-then-else'
constructs that can be collapsed into an operation (or a straight line
sequence of operations) that is supported by the target. Indeed, an important
set of the enhancements listed below fall into this category (<a href="#Idiom recognition">idioms
recognition</a>, <a href="#Access Pattern">access patterns analysis</a>);
This is because the general principle we are trying to follow is to keep
the actual code transformation part of the vectorizer as simple as possible:
a simple scan of straight-line code, and a 1-1 replacement of each scalar
operation with the equivalent vector operation. To support this scheme,
new "virtual" scalar operations will need to be introduced (probably as
built-in functions) by the preprocessing phase, to be later replaced by
actual vector operations. In the tree level, vector operations are represented
using built-in functions, unless it is decided that it is desirable to
introduce special tree expressions to represent them.</font>
<br>&nbsp;
<table BORDER COLS=3 WIDTH="98%" >
<tr ALIGN=LEFT VALIGN=TOP>
<td ALIGN=LEFT VALIGN=TOP WIDTH="33%"><b>vectorization driver</b></td>

<td WIDTH="33%"><b><font color="#000000">basic vectorizer</font></b></td>

<td WIDTH="33%"><b><font color="#000000">enhancements</font></b></td>
</tr>

<tr ALIGN=LEFT VALIGN=TOP>
<td><font color="#3333FF">analyze_loop_CFG(loop).&nbsp;</font>
<br><font color="#000000">Checks the <a href="#loop CFG">control flow properties
of the loop</a> (number of basic-blocks it consists of, nesting, single
entry/exit, etc), in order to determine whether the control flow of the
loop falls within the range of loop forms that are supported by this vectorizer.</font></td>

<td>
<ul>
<li>
<font color="#000000">inner-most (single nest) loops</font></li>

<li>
<font color="#000000">single basic block loops (no if-then-else constructs,
etc)&nbsp;</font></li>

<li>
<font color="#000000">other restrictions (single successor/predecessor
pre-heated/latch block)</font></li>
</ul>
</td>

<td>
<ul>
<li>
<font color="#000000">use <a href="#Idiom recognition">idiom recognition</a>
to collapse if-then-else constructs into a single operation</font></li>

<li>
<font color="#000000">handle <a href="#Idiom recognition">conditional execution&nbsp;</a></font></li>

<li>
<font color="#000000">the above need to be aware of <a href="#Machine">target
specific vector capabilities</a>.</font></li>

<li>
<font color="#000000">extend the <a href="#loop forms">range of loop forms</a>
that can be&nbsp; vectorized, with respect to their CFG.</font></li>
</ul>
</td>
</tr>

<tr ALIGN=LEFT VALIGN=TOP>
<td><font color="#3333FF">analyze_loop_index_and_bound(loop)</font><font color="#000000">.
Analyzes the loop termination condition to determine the loop bound and
properties of the loop index (its bounds and step). The functionality of
this utility should be largely provided by the information computed by
the <a href="#Induction Variable">Induction Variable Analyzer</a>).</font></td>

<td>
<ul>
<li>
<font color="#000000">handle simple normalized loops (loop index is a trivial
IV with step 1 etc), with a simple termination condition.</font></li>

<li>
<font color="#000000">loop-bound known at compile time</font></li>

<li>
<font color="#000000">loop-bound >= vector_size and loop-bound % vector_size
= 0</font></li>
</ul>
</td>

<td>
<ul>
<li>
<font color="#000000">relax the restrictions on the <a href="#loop forms">loop&nbsp;
bound</a> and <a href="#Access Pattern">loop index</a> imposed by the basic
vectorizer</font></li>
</ul>
</td>
</tr>

<tr ALIGN=LEFT VALIGN=TOP>
<td><font color="#3333FF">analyze_loop_stmts(loop-stmts).</font>
<br><font color="#000000">Scan the loop statements and check whether there
are any statements that prohibit vectorization (function calls, statements
that don't have a <a href="#mapping">mapping</a> to a built-in vector function,
etc)</font></td>

<td>
<ul>
<li>
<font color="#000000">no support for scalar expansion</font></li>

<li>
<font color="#000000">no mixture of data types (all statements operate
on the same data types)</font></li>

<li>
<font color="#000000">possibly other restrictions</font></li>
</ul>
</td>

<td>
<ul>
<li>
<font color="#000000">handle <a href="#computations">scalar expansion</a></font></li>

<li>
<font color="#000000">handle computations with <a href="#computations">mixed
data types</a>.&nbsp;</font></li>
</ul>
</td>
</tr>

<tr ALIGN=LEFT VALIGN=TOP>
<td><font color="#3333FF">analyze_access_pattern(loop-mem-refs)</font>
<br><font color="#000000">Analyze the memory references in the loop, and
<a href="#Access Pattern">classify
them according to the access pattern</a> that they exhibit.&nbsp;</font></td>

<td>
<ul>
<li>
<font color="#000000">support only memory accesses which are array references
(no pointers...)</font></li>

<li>
<font color="#000000">support only consecutive (unit stride) access pattern</font></li>
</ul>
</td>

<td>
<ul>
<li>
extend the access pattern analyzer to handle pointer references</li>

<li>
improve the <a href="#pointer aliasing">memory aliasing</a> capabilities</li>

<li>
handle non consecutive access patterns <a href="#Machine">if supported
by the target</a></li>
</ul>
</td>
</tr>

<tr ALIGN=LEFT VALIGN=TOP>
<td><font color="#3333FF">analyze_alignment(loop-mem-refs)</font>
<br><font color="#000000">Analyze the alignment of all the memory references
in the loop. For each memory reference, record its misalignment amount,
if it can be resolved at compile time.</font></td>

<td>
<ul>
<li>
misalignment amount for all memory references is known at compile time</li>

<li>
misalignment = 0 (all references are aligned)</li>
</ul>
</td>

<td>
<ul>
<li>
<a href="#Alignment">handle unaligned accesses</a></li>
</ul>
</td>
</tr>

<tr ALIGN=LEFT VALIGN=TOP>
<td><font color="#3333FF">analyze_loop_carried_dependences(loop)</font>
<br><font color="#000000">Build the loop dependence graph (for scalar and
array references); Detect Strongly Connected Components in the graph(statements
that are involved in a dependence cycle); Perform a topological sort on
the reduced graph (in which each SCC is represented by a single node);
Only singleton SCCs w/o self dependences can be vectorized. If other SCCs
are present - loop transformations are required.</font></td>

<td ALIGN=LEFT VALIGN=TOP>
<ul>
<li>
<font color="#000000">handle only loops that consist of singleton SCCs
w/o self deps.</font></li>

<li>
the only scalar loop-carried dependencies allowed are of IVs which are
used in array references or for the loop index (i.e, reduction is not supported).</li>

<li>
use simplest dependence tests .</li>
</ul>
</td>

<td>
<ul>
<li>
<font color="#000000">Use more <u><a href="#Dependence">advanced dependence
tests</a></u></font></li>

<li>
<font color="#000000">Try to <a href="#Dependence">break cycles in the
dependence graph</a> of the loop by performing <a href="#Loop transform">loop
transformations</a> (or <a href="#Idiom recognition">detecting reduction</a>).</font></li>
</ul>
</td>
</tr>

<tr ALIGN=LEFT VALIGN=TOP>
<td><font color="#3333FF">estimate_vectorization_profitability(loop)</font>
<br>At this point, it has been determined that the loop is vectorizable.
It remains to decide whether it is indeed profitable to vectorize it.</td>

<td>
<ul>
<li>
Vectorize all loops which loop bound >= MIN_LIMIT (?)</li>
</ul>
</td>

<td>
<ul>
<li>
develop a <a href="#Cost">cost model</a></li>
</ul>
</td>
</tr>

<tr ALIGN=LEFT VALIGN=TOP>
<td><font color="#3333FF">vectorize_loop(loop)</font>
<br>Replace the scalar statements with the corresponding built-in vector
functions; Also change the loop bound accordingly.</td>

<td>
<ul>
<li>
Use a <a href="#mapping">scalar-to-vector mapping</a></li>
</ul>
</td>

<td>
<ul>
<li>
<a href="#Other Optimizations.">misc optimizations</a></li>

<li>
<a href="#Other Issues">misc issues</a> to consider</li>
</ul>
</td>
</tr>
</table>

<p><font color="#009900"><font size=+1>List of Vectorization Related Tasks</font></font>
<p>The following is a list of independent directions by which the basic
vectorizer can be enhanced. It should be possible to for different people
to work on different items from this list. Some of these items maybe already
under development, or maybe already supported.
<ul>
<li>
<a NAME="loop CFG"></a><b><font color="#FF0000">Loop detection and loop
CFG analysis.</font></b></li>

<br><font color="#000000">Detect loops, and record some basic control flow
information about them (number of basic blocks, loop latch and pre-header,
etc.); Also develop utilities to support CFG transformations on loops.</font>
<p><font color="#000000"><b>Status: </b>Loop detection and control flow
analysis is already supported (cfgloop.c, cfgloopanal.c); tools to perform
loop transformation (e.g, duplicate a loop body for code versioning) -
to be revisited when gimple representation stabilizes.</font><br>
<BR>
<li>
<a NAME="Machine"></a><b><font color="#FF0000">Machine Modeling.</font></b></li>

<br><font color="#000000">Expose the required machine dependent information
to the tree-level. A first basic implementation will not deal with issues
that require machine dependent knowledge; for such an implementation, knowledge
of the size of the vector would suffice. As the capabilities of the vectorizer
are extended, it will be required to inform the vectorizer of the capabilities
available in the architecture (for example, mechanisms to deal with misalignment,
conditional execution for vector instructions, etc.).</font>
<p><font color="#000000"><b>Status:</b> N/A</font><br>
<BR>
<li>
<a NAME="mapping"></a><b><font color="#FF0000">Scalar-to-Vector Mapping.</font></b></li>

<br><font color="#000000">Provide a mapping of scalar statements to the
corresponding built-in vector functions.</font>
<p><font color="#000000"><b>Status:</b> The general SIMD support in gcc
that emulates vector function built-ins for architectures without vector/SIMD
extensions may provide some initial support already?</font><br>
<BR>
<li>
<a NAME="Cost"></a><b><font color="#FF0000">Cost Model.</font></b></li>

<br><font color="#000000">There is an overhead associated with vectorization
- moving data in to/out of vector registers before/after the vectorized
loop, aligning of data accesses, etc. It is required to incorporate a cost
model into the machine description in order to allow the vectorizer to
evaluate whether it is worth while to vectorize a given loop. One can also
consider using run time tests to decide which version of the loop to execute
(scalar or vectorized).</font>
<p><font color="#000000"><b>Status:</b> N/A</font>
<br>&nbsp;
<li>
<a NAME="Induction Variable"></a><b><font color="#FF0000">Induction Variable
Analysis.</font></b></li>

<br><font color="#000000">Used by the vectorizer to detect loop bound,
analyze access patterns and analyze data dependences between array references.
One option is that the dependence tests would be designed deal with array
references that are expressed in terms of a linear function of the iteration
counter (in this case, vectorization will also benefit from optimizations
like induction variable substitution and replacement of auxiliary IVs with
linear function of the loop index). A dependence tester that is based on
IVs represented in this form would analyze each subscript of each array
reference, and apply the appropriate dependence test (SIV, ZIV, MIV etc.,
see <a href="#Dependence">dependence testing</a>). Alternatively, an induction
variable evolution analyzer could provide a different implementation to
the dependence tester.</font>
<p><font color="#000000"><b>Status:</b> Induction Variable Analysis is
under development (?). In addition, recurrence-chains based induction variable
evolution analyzer is under development (?).</font>
<br>&nbsp;
<li>
<a NAME="Dependence"></a><b><font color="#FF0000">Dependence Testing.</font></b></li>

<br><font color="#000000">Following the classic dependence-based approach
for vectorization as described in&nbsp;</font> <a href="#kenedy-book">#kenedy-book</a>,
apply dependence tests to pairs of array references in each loop nest,
and analyze the resulting dependence graph. We will s<font color="#000000">tart
from a dependence analyzer that relies on the array references being expressed
in terms of a linear function of the loop index, apply the simplest dependence
tests to all
</font>pairs of memory read/write and write/write, and g<font color="#000000">radually
extend its capabilities.The scheme below follows the algorithm described
in&nbsp;</font> <a href="#tseng">#tseng:</a>
<ul>
<li>
<font color="#000000">Partition the array subscripts into seperable sets
(subscripts which index does not occur in other subscripts)</font></li>

<li>
<font color="#000000">Apply the simplest tests for separable subscripts
(srong SIV (single index variable) tests).</font></li>

<li>
<font color="#000000">Incorporate more advanced dependence tests; First,
tests for separable subscripts - e.g., weak SIV tests, MIV (Multiple Index
Variable) tests, followed by tests for coupled subscripts, possibly up
to integer linear programming like Fourier-Motzkin elimination.</font></li>

<li>
<font color="#000000">Compute dependence distance, and prune dependences
which distance > vector_size.</font></li>

<li>
<font color="#000000">Try to handle cycles in the dependence graph of the
loop (by performing loop distribution, etc.).</font></li>

<li>
<font color="#000000">Generalize the dependence tester to nested loops</font></li>
</ul>
An alternative implementation may be provided by an IV evolution analyzer.
<p><b>Status:</b> N/A
<ul>&nbsp;</ul>

<li>
<a NAME="Access Pattern"></a><b><font color="#FF0000">Access Pattern Analysis.</font></b></li>

<br><font color="#000000">The memory architecture usually allows only restricted
accesses to data in memory; one of the restrictions is that the accessed
data must be consecutive in memory. Any other accesses (strided for example)
require to perform special permutation of the data in order to pack the
data elements in the right order into a vector register. Support for different
access patterns consists of the following stages:</font>
<ul>
<li>
<font color="#000000">Access pattern analysis - Classify the access pattern
of each array reference. Like the dependence analyzer, the Access pattern
analyzer also relies on the array references being expressed in terms of
a linear function of the loop index.</font></li>

<li>
<font color="#000000">Trivially handle consecutive (unit stride) access
patterns (a[i])</font></li>

<li>
<font color="#000000">Handle strided access patterns (a[2*i]). The stride
2 acceess pattern appears in computations on complex numbers, where the
real and imaginary parts are interleaved in the input/output array.</font></li>

<li>
<font color="#000000">Handle other types of access patterns?</font></li>

<li>
<font color="#000000">Support pointer arithmetic</font></li>
</ul>

<p><br>In order to facilitate a scheme that trivially replaces each operation
with its equivalent SIMD instruction, a <b>preprocessing pass</b> may need
to take place in order to convert the scalar operations into a proper (scalar)
form (a similar issue come up in the <a href="#Idiom recognition">"flattening"
of conditional operations</a> into virtual scalar operation); A trivial
example is to convert 'a[i]=b[i]' into the sequence 't=b[i]; a[i]=t' which
will be later replaced with a vector load and a vector store. Another example
comes up when expressing strided data accesses; in this case, address computation
operations are generated. For example, the following code sequence is generated
in order to represent the expression '<i>a[2*i+1] = b[i]</i>':
<ol>
<li>
<i>T.1 = i * 2;</i></li>

<li>
<i>T.2 = T.1 + 1;</i></li>

<li>
<i>a[T.2] = b[i];</i></li>
</ol>
These operations (1 and 2 in the example) determine the data permutation
that will be performed, however, they do not need to be directly vectorized.
It maybe required to mark such code as "related to computation of array
reference X" in order to indicate that no vector code needs to be generated
for it; instead, we may want&nbsp; to consider collapsing this code into
the relevant array reference, similarly to forward substitution (but this
would probably mean breaking the gimple convensions?).
<p><b>Status:</b> <font color="#000000">This analysis relies on the way
array references and their related address computation is represented in
the gimple trees. It also relies on the ways IVs are expressed.</font>
<br>&nbsp;
<li>
<a NAME="computations"></a><b><font color="#FF0000">Extend the range of
supportable operations.</font></b></li>

<br><font color="#000000">At first, the only computations that will be
vectorized are those for which the vectorization process consists of trivially
replacing each scalar operation in the loop with it's vector counterpart.
This includes simple loads, stores and arithmetic operations that operate
on the same data type. Some computations require extra code to be generated
in order to vectorize; These include:</font>
<ul>
<li>
<font color="#000000">Computations with mixed types; these require proper
promotion/demotion between vectors of different sizes.</font></li>

<li>
<font color="#000000">Computations that involve loop invariants (a[i] =
N) and involves scalar expansion.</font></li>

<li>
<font color="#000000">Computations that involve induction variables (a[i]
= i); involves scalar expansion, and proper initialization and update code.</font></li>
</ul>

<p><br><font color="#000000"><b>Status:</b>N/A</font><!C-It may be required for each array reference,
to mark all related code that is used to compute the address (the array
subscript). This code will not be vectorized, but will determine what kind
of permutation will be required to support the data access.>
<br>&nbsp;
<li>
<a NAME="Alignment"></a><b><font color="#FF0000">Alignment.</font></b></li>

<br><font color="#000000">The memory architecture usually allows only restricted
accesses to data in memory; one of the restrictions is that data accesses
need to be properly aligned on a certain boundary. Even if the architecture
supports unaligned accesses, these are usually much more costly than aligned
accesses. The work on alignment consists of several stages:</font>
<ul>
<li>
<font color="#000000">Analysis - compute the misalignment properties of
each memory access.</font></li>

<li>
<font color="#000000">Handling of aligned memory accesses only (do not
attempt to vectorize loops that contain unaligned accesses)</font></li>

<li>
<font color="#000000">If it is impossible to determine at compile time
whether the memory access is aligned, create two versions of the loop and
use a runtime test to decide which version to execute - the original scalar
version (if the data access is not aligned), or the vectorized version
(if the access is aligned).</font></li>

<li>
<font color="#000000">Develop optimizations that increase the alignment
of data accesses (static loop peeling, dynamic loop peeling, etc)</font></li>

<li>
<font color="#000000">Vectorize unaligned accesses. There are different
ways to do that, depending on whether the target supports unaligned accesses,
and also depending on what we want to implement at the tree level, and
what we want to leave for the RTL level to handle. For example, one could
generate the required code to support a mis-aligned memory access (for
miss-aligned loads - generate an additional load and the required merging
code). Alternatively, the tree level could just generate a "VEC_UNALIGNED_LOAD"
insn, that will be translated later in the RTL level into the appropriate
RTL sequence that handles the unaligned load (in a target dependent way).
This means however that optimizations that try to reduce the merging overhead
cannot take place in the tree level, where it is probably easier to do
than at the RTL level.</font></li>

<br>&nbsp;</ul>
<font color="#000000"><b>Status:</b> N/A</font>
<br>&nbsp;
<li>
<a NAME="Idiom recognition"></a><b><font color="#FF0000">Idiom Recognition
and Conditional Execution.</font></b></li>

<br><font color="#000000">It is often the case that complicated computations
can be reduced into a simpler, straight-line sequnce of operation. These
operations may not be directly supported in a scalar form, but are supported
by the target in a vector form. Such cases include:</font>
<ul>
<li>
<font color="#000000">Detect and support reduction operations:</font></li>

<ul>
<li>
<font color="#000000">Detect different kinds of reduction (summation, min/max,
min/max with index, etc.); once detected, these idioms may be replaced
with a virtual scalar operation (to facilitate straight forward vector
code generation).</font></li>

<li>
<font color="#000000">Vectorizing the reduction requires generating epilog
code after the loop. First - generate a scalar code sequence to compute
the epilog. Later, try to make use of target dependent mechanisms to compute
the epilog of the reduction more efficiently.</font></li>
</ul>

<li>
<font color="#000000">Support conditional vector execution, using masked
vector operations, or predicated vector operations, or vector select operations,
according to what is available in the specific target.</font></li>

<li>
<font color="#000000">Detect special patterns that are supported by the
architecture; This may involve collapsing multi-block constructs (such
as if-then-else) into a single vectorizable operation. For example, in
the following code sequence (taken from the SPEC-int benchmark gzip):<br>
<i>&nbsp;&nbsp;&nbsp; for (n = 0; n &lt; HASH_SIZE; n++) {</i></font></li>

<br><i><font color="#000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
m = head[n];</font></i>
<br><i><font color="#000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
head[n] = (Pos)(m >= 32768 ? m-32768 : 0);</font></i>
<br><i><font color="#000000">&nbsp;&nbsp;&nbsp; }</font></i>
<br><font color="#000000">the 'select' expression can be collpased into
a 'subtract and saturate' operation (see <a href
="http://gcc.gnu.org/ml/gcc/2003-07/msg01355.html";>http://gcc.gnu.org/ml/gcc/2003-07/msg01355.html</a>);</font>
<li>
<font color="#000000">Detect saturation idioms, and map them to the appropriate
vector built-in function.</font></li>

<br>&nbsp;</ul>
<font color="#000000"><b>Status:</b> N/A</font>
<br>&nbsp;
<li>
<a NAME="loop forms"></a><b><font color="#FF0000">Handle Advanced Loop
Forms.</font></b></li>

<ul>
<li>
<font color="#000000">General loop bound (unknown, or doesn't divide by
the vector size)</font></li>

<li>
<font color="#000000">More complex forms of loop termination condition
and loop index update</font></li>

<li>
<font color="#000000">Outer-loop vectorization (unroll and jam)</font></li>

<li>
<font color="#000000">Relax other CFG restrictions?</font></li>

<br>&nbsp;</ul>
<font color="#000000"><b>Status:</b> N/A</font>
<br>&nbsp;
<li>
<a NAME="pointer aliasing"></a><b><font color="#FF0000">Handle Pointer
Aliasing.</font></b></li>

<ul>
<li>
<font color="#000000">Develop strong anti-aliasing analysis</font></li>

<li>
<font color="#000000">Generate run-time tests for cases where memory anti-aliasing
cannot
be resolved at compile time</font></li>

<li>
<font color="#000000">Support user hints?</font></li>

<br>&nbsp;</ul>
<font color="#000000"><b>Status:</b> Some aliasing infrastructure is available.
points-to analysis also being developed (?).</font>
<br>&nbsp;
<li>
<a NAME="Loop transform"></a><b><font color="#FF0000">Loop Transformations
to Increase Vectorizability of Loops.</font></b></li>

<br><font color="#000000">These include:</font>
<ul>
<li>
<font color="#000000">loop interchange, and other unimodular transformations</font></li>

<li>
<font color="#000000">scalar expansion</font></li>

<li>
<font color="#000000">loop distribution</font></li>

<li>
<font color="#000000">collapsing tightly nested loops to a single loop</font></li>

<br>&nbsp;</ul>
<font color="#000000"><b>Status</b>:</font>
<br>&nbsp;
<li>
<a NAME="Other Optimizations."></a><b><font color="#FF0000">Other Optimizations.</font></b></li>

<ul>
<li>
<font color="#000000">Exploit data reuse (a la "Compiler-Controlled Caching
in Superword Register Files for Multimedia Extension Architectures" by
Shin, Chame and Hall); Mostly relies on unroll &amp; jam having been applied.</font></li>

<li>
<font color="#000000">Vectorize loops that can't be vectorized using the
classic vectorizer (until the proper loop transformations are developed)
by applying SLP vectorization (a la "Exploiting Superword Level Parallelism
with Multimedia Instruction Sets" by Amarasinghe and Larsen). This scheme
can potentially more easily vectorize partially vectorizable loops or loops
that are already unrolled in the source code. It is possible to implement
SLP vectorization either in the tree level or at the RTL level as a complementary
approach to classic loop vectorization.</font></li>

<br>&nbsp;</ul>
<font color="#000000"><b>Status</b>:</font>
<br>&nbsp;
<li>
<a NAME="Other Issues"></a><b><font color="#FF0000">Other Issues to consider.</font></b></li>

<ul>
<li>
<font color="#000000">Consider using user hints for different purposes
(aliasing, alignment, profitability of vectorizing a loop...)</font></li>

<li>
<font color="#000000">Consider which of the items listed above should take
place in the tree level, and which in the RTL level. For example, anything
that could affect the decision whether to vectorize&nbsp; a loop, should
take place at the tree level (in order to avoid having to undo transformations);
Also, many of the things that can take place at the RTL level maybe simpler
to implement at the tree level. However, this would mean that a lot of
machine dependent stuff will have to be done at the tree level.</font></li>

<li>
<font color="#000000">Represent the vector operations as actual tree expression
rather than built-in functions (?)</font></li>

<li>
<font color="#000000">Support C based languages that have SIMD or vector
extensions (?)</font></li>

<br>&nbsp;</ul>
<font color="#000000"><b>Status</b>:</font></ul>

<p><br><font color="#009900"><font size=+1>(Incomplete) Reference list</font></font>
<ol>
<li>
<a NAME="kenedy-book"></a><font color="#000000">"Optimizing Compilers for
Modern Architectures - A dependence based approach", Randy Allen &amp;
Ken Kennedy, Morgan Kaufmann Publishers, San Francisco, San Diego, New
York&nbsp; (2001)</font></li>

<li>
<a NAME="tseng"></a><font color="#000000">"Practical Dependence Testing",
Gina Goff, Ken Kennedy, Chau-Wen Tseng, CRPC+TR 90103, November 1990</font></li>
</ol>

<hr><i><font color="#000000">Last updated on Thu Sep 4 2003</font></i>
</body>
</html>


(See attached file: vect_plan)





                                                                                                                                     
                      Diego Novillo                                                                                                  
                      <dnovillo@redhat.        To:       Dorit Naishlos/Haifa/IBM@IBMIL                                              
                      com>                     cc:       "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>, "gcc-patches@gcc.gnu.org"              
                                                <gcc-patches@gcc.gnu.org>                                                            
                      09/09/2003 16:44         Subject:  Re: [tree-ssa] Minor update for project web page - vectorization            
                                                                                                                                     




On Mon, 2003-09-08 at 09:18, Dorit Naishlos wrote:

> This draft is currently linked through my old university account; I hope
it
> would be possible to include it in the gcc web pages.
>
Absolutely.  Thanks for the thorough description!  I think it would be
best if we added it as a new html file linked through the main Tree SSA
page.  Could you prepare a patch for that?


Thanks.  Diego.


Attachment: vect_plan
Description: Binary data


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]