This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
> Could you prepare a patch for that? Sure. Here it is - a suggestion for a small update to index.html, and the vectorization html file, both inlined and as an attachment). thanks, dorit Index: index.html =================================================================== RCS file: /cvsroot/gcc/wwwdocs/htdocs/projects/tree-ssa/index.html,v retrieving revision 1.21 diff -c -3 -p -r1.21 index.html *** index.html 9 Sep 2003 16:34:06 -0000 1.21 --- index.html 9 Sep 2003 17:38:10 -0000 *************** different passes are welcome.</p> *** 411,416 **** --- 411,419 ---- <a href="http://icps.u-strasbg.fr/cgi-bin/temporal/temporal.cgi"> translator</a> for Fortran.</dd> + <dt><i><a href="vect_plan.html">Vectorization</a></i></dt> + <dd>This pass should vectorize certain loops, using classic data dependence + based approach.</dd> </dl> New file: vect_plan.html ================================================================== <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="GENERATOR" content="Mozilla/4.7 [en] (WinNT; I) [Netscape]"> <title>Vectorization in GCC</title> </head> <body text="#000000" bgcolor="#FFCC99" link="#0000EE" vlink="#551A8B" alink="#FF0000"> <br><font color="#3333FF"><font size=+1>Auto-vectorization in GCC -</font></font> <br><font color="#3333FF"><font size=+1>Preliminary High level plan of implementation</font></font> <p><font color="#000000">This document outlines a high level plan for the implementation of auto-vectorization in the tree-ssa branch of gcc. The main goals of this document are to:</font> <ul> <li> <font color="#000000"><b>Facilitate contributing to the different vectorization-related components.</b> Vectorization consists of many independent components. While some basic infrastructure will be required by all vectorization "clients", there are many directions by which this basic infrastructure can be enhanced. The decision on which directions to pursue first is in many cases guided by the specific properties of a given target architecture and target application domain. Therefore, it is important to keep developers informed about the implementation status of these different directions, and allow them to push those directions that are more important for their targets (but possibly too far down the vectorizer's todo list).</font></li> <li> <font color="#000000"><b>Increase synchronization and minimize work duplication.</b> Vectorization consists of, or can benefit from, components that are used by other compiler optimizations; some of these components are already under development. The hope is that exposing the list of components that the vectorizer may need to interact with, would help to increase the synchronization between the vectorizer and other components and assist in minimizing work duplication.</font></li> <li> <font color="#000000"><b>Expose this work.</b> Allow early exposure of this work, in order to be able to get feedback. tree-ssa developers are encouraged to comment on the infrastructure that is available (or is being or will be developed) in the tree-ssa and it's implications on the items listed here.</font></li> </ul> <font color="#000000">This document tries to serve the above goals by <b>(a)</b> providing a list all the issues that vectorization may need to handle, <b>(b)</b> breaking the implementation into independent tasks as much as possible, and <b>(c)</b> proposing a gradual implementation that consists of a very limited vectorizer as a first step, which will be gradually extend in different directions.</font> <p><font color="#009900"><font size=+1>General Vectorization Scheme</font></font> <p><font color="#000000">The table below outlines the high-level vectorization scheme along with a proposal for an implementation scheme, as follows:</font> <ul> <li> <font color="#000000">The first column ("<b>vectorization driver</b>") lists the tasks that the vectorizer must consist of. It briefly describes the expected functionality of each task.</font></li> <li> <font color="#000000">The second column ("<b>basic-vectorizer</b>") describes a proposal for a basic vectorizer that provides minimal support for each of these tasks, listing the restrictions that will be imposed by the basic vectorizer on candidate loops. Loops that are considered vectorizable by the basic-vectorizer are of the form:</font></li> <br><font color="#000000"> <i>for(i=0; i<N; i++){</i></font> <br><i><font color="#000000"> a[i] = b[i] + c[i];</font></i> <br><i><font color="#000000"> }</font></i> <li> <font color="#000000">The third column ("<b>enhancements</b>") lists possible directions for extending the capabilities of the basic vectorizer .</font></li> </ul> <font color="#000000">Following the table is a complete list of these enhancement, that provides some more detail. Some of these enhancements are aimed at improving the efficiency of the vector code that is being generated. Other enhancements aim to broaden the range of loops that are amenable for vectorization. One way to do it is to extend the vectorizer to handle additional loop forms (for example, extend the vectorizer to handle loops that contain reduction, type casting, etc.). Another way is to transform loops to confirm to the set of loops that the vectorizer can handle. For example, for the case of conditional code - a <b>preprocessing phase</b> will detect 'if-then-else' constructs that can be collapsed into an operation (or a straight line sequence of operations) that is supported by the target. Indeed, an important set of the enhancements listed below fall into this category (<a href="#Idiom recognition">idioms recognition</a>, <a href="#Access Pattern">access patterns analysis</a>); This is because the general principle we are trying to follow is to keep the actual code transformation part of the vectorizer as simple as possible: a simple scan of straight-line code, and a 1-1 replacement of each scalar operation with the equivalent vector operation. To support this scheme, new "virtual" scalar operations will need to be introduced (probably as built-in functions) by the preprocessing phase, to be later replaced by actual vector operations. In the tree level, vector operations are represented using built-in functions, unless it is decided that it is desirable to introduce special tree expressions to represent them.</font> <br> <table BORDER COLS=3 WIDTH="98%" > <tr ALIGN=LEFT VALIGN=TOP> <td ALIGN=LEFT VALIGN=TOP WIDTH="33%"><b>vectorization driver</b></td> <td WIDTH="33%"><b><font color="#000000">basic vectorizer</font></b></td> <td WIDTH="33%"><b><font color="#000000">enhancements</font></b></td> </tr> <tr ALIGN=LEFT VALIGN=TOP> <td><font color="#3333FF">analyze_loop_CFG(loop). </font> <br><font color="#000000">Checks the <a href="#loop CFG">control flow properties of the loop</a> (number of basic-blocks it consists of, nesting, single entry/exit, etc), in order to determine whether the control flow of the loop falls within the range of loop forms that are supported by this vectorizer.</font></td> <td> <ul> <li> <font color="#000000">inner-most (single nest) loops</font></li> <li> <font color="#000000">single basic block loops (no if-then-else constructs, etc) </font></li> <li> <font color="#000000">other restrictions (single successor/predecessor pre-heated/latch block)</font></li> </ul> </td> <td> <ul> <li> <font color="#000000">use <a href="#Idiom recognition">idiom recognition</a> to collapse if-then-else constructs into a single operation</font></li> <li> <font color="#000000">handle <a href="#Idiom recognition">conditional execution </a></font></li> <li> <font color="#000000">the above need to be aware of <a href="#Machine">target specific vector capabilities</a>.</font></li> <li> <font color="#000000">extend the <a href="#loop forms">range of loop forms</a> that can be vectorized, with respect to their CFG.</font></li> </ul> </td> </tr> <tr ALIGN=LEFT VALIGN=TOP> <td><font color="#3333FF">analyze_loop_index_and_bound(loop)</font><font color="#000000">. Analyzes the loop termination condition to determine the loop bound and properties of the loop index (its bounds and step). The functionality of this utility should be largely provided by the information computed by the <a href="#Induction Variable">Induction Variable Analyzer</a>).</font></td> <td> <ul> <li> <font color="#000000">handle simple normalized loops (loop index is a trivial IV with step 1 etc), with a simple termination condition.</font></li> <li> <font color="#000000">loop-bound known at compile time</font></li> <li> <font color="#000000">loop-bound >= vector_size and loop-bound % vector_size = 0</font></li> </ul> </td> <td> <ul> <li> <font color="#000000">relax the restrictions on the <a href="#loop forms">loop bound</a> and <a href="#Access Pattern">loop index</a> imposed by the basic vectorizer</font></li> </ul> </td> </tr> <tr ALIGN=LEFT VALIGN=TOP> <td><font color="#3333FF">analyze_loop_stmts(loop-stmts).</font> <br><font color="#000000">Scan the loop statements and check whether there are any statements that prohibit vectorization (function calls, statements that don't have a <a href="#mapping">mapping</a> to a built-in vector function, etc)</font></td> <td> <ul> <li> <font color="#000000">no support for scalar expansion</font></li> <li> <font color="#000000">no mixture of data types (all statements operate on the same data types)</font></li> <li> <font color="#000000">possibly other restrictions</font></li> </ul> </td> <td> <ul> <li> <font color="#000000">handle <a href="#computations">scalar expansion</a></font></li> <li> <font color="#000000">handle computations with <a href="#computations">mixed data types</a>. </font></li> </ul> </td> </tr> <tr ALIGN=LEFT VALIGN=TOP> <td><font color="#3333FF">analyze_access_pattern(loop-mem-refs)</font> <br><font color="#000000">Analyze the memory references in the loop, and <a href="#Access Pattern">classify them according to the access pattern</a> that they exhibit. </font></td> <td> <ul> <li> <font color="#000000">support only memory accesses which are array references (no pointers...)</font></li> <li> <font color="#000000">support only consecutive (unit stride) access pattern</font></li> </ul> </td> <td> <ul> <li> extend the access pattern analyzer to handle pointer references</li> <li> improve the <a href="#pointer aliasing">memory aliasing</a> capabilities</li> <li> handle non consecutive access patterns <a href="#Machine">if supported by the target</a></li> </ul> </td> </tr> <tr ALIGN=LEFT VALIGN=TOP> <td><font color="#3333FF">analyze_alignment(loop-mem-refs)</font> <br><font color="#000000">Analyze the alignment of all the memory references in the loop. For each memory reference, record its misalignment amount, if it can be resolved at compile time.</font></td> <td> <ul> <li> misalignment amount for all memory references is known at compile time</li> <li> misalignment = 0 (all references are aligned)</li> </ul> </td> <td> <ul> <li> <a href="#Alignment">handle unaligned accesses</a></li> </ul> </td> </tr> <tr ALIGN=LEFT VALIGN=TOP> <td><font color="#3333FF">analyze_loop_carried_dependences(loop)</font> <br><font color="#000000">Build the loop dependence graph (for scalar and array references); Detect Strongly Connected Components in the graph(statements that are involved in a dependence cycle); Perform a topological sort on the reduced graph (in which each SCC is represented by a single node); Only singleton SCCs w/o self dependences can be vectorized. If other SCCs are present - loop transformations are required.</font></td> <td ALIGN=LEFT VALIGN=TOP> <ul> <li> <font color="#000000">handle only loops that consist of singleton SCCs w/o self deps.</font></li> <li> the only scalar loop-carried dependencies allowed are of IVs which are used in array references or for the loop index (i.e, reduction is not supported).</li> <li> use simplest dependence tests .</li> </ul> </td> <td> <ul> <li> <font color="#000000">Use more <u><a href="#Dependence">advanced dependence tests</a></u></font></li> <li> <font color="#000000">Try to <a href="#Dependence">break cycles in the dependence graph</a> of the loop by performing <a href="#Loop transform">loop transformations</a> (or <a href="#Idiom recognition">detecting reduction</a>).</font></li> </ul> </td> </tr> <tr ALIGN=LEFT VALIGN=TOP> <td><font color="#3333FF">estimate_vectorization_profitability(loop)</font> <br>At this point, it has been determined that the loop is vectorizable. It remains to decide whether it is indeed profitable to vectorize it.</td> <td> <ul> <li> Vectorize all loops which loop bound >= MIN_LIMIT (?)</li> </ul> </td> <td> <ul> <li> develop a <a href="#Cost">cost model</a></li> </ul> </td> </tr> <tr ALIGN=LEFT VALIGN=TOP> <td><font color="#3333FF">vectorize_loop(loop)</font> <br>Replace the scalar statements with the corresponding built-in vector functions; Also change the loop bound accordingly.</td> <td> <ul> <li> Use a <a href="#mapping">scalar-to-vector mapping</a></li> </ul> </td> <td> <ul> <li> <a href="#Other Optimizations.">misc optimizations</a></li> <li> <a href="#Other Issues">misc issues</a> to consider</li> </ul> </td> </tr> </table> <p><font color="#009900"><font size=+1>List of Vectorization Related Tasks</font></font> <p>The following is a list of independent directions by which the basic vectorizer can be enhanced. It should be possible to for different people to work on different items from this list. Some of these items maybe already under development, or maybe already supported. <ul> <li> <a NAME="loop CFG"></a><b><font color="#FF0000">Loop detection and loop CFG analysis.</font></b></li> <br><font color="#000000">Detect loops, and record some basic control flow information about them (number of basic blocks, loop latch and pre-header, etc.); Also develop utilities to support CFG transformations on loops.</font> <p><font color="#000000"><b>Status: </b>Loop detection and control flow analysis is already supported (cfgloop.c, cfgloopanal.c); tools to perform loop transformation (e.g, duplicate a loop body for code versioning) - to be revisited when gimple representation stabilizes.</font><br> <BR> <li> <a NAME="Machine"></a><b><font color="#FF0000">Machine Modeling.</font></b></li> <br><font color="#000000">Expose the required machine dependent information to the tree-level. A first basic implementation will not deal with issues that require machine dependent knowledge; for such an implementation, knowledge of the size of the vector would suffice. As the capabilities of the vectorizer are extended, it will be required to inform the vectorizer of the capabilities available in the architecture (for example, mechanisms to deal with misalignment, conditional execution for vector instructions, etc.).</font> <p><font color="#000000"><b>Status:</b> N/A</font><br> <BR> <li> <a NAME="mapping"></a><b><font color="#FF0000">Scalar-to-Vector Mapping.</font></b></li> <br><font color="#000000">Provide a mapping of scalar statements to the corresponding built-in vector functions.</font> <p><font color="#000000"><b>Status:</b> The general SIMD support in gcc that emulates vector function built-ins for architectures without vector/SIMD extensions may provide some initial support already?</font><br> <BR> <li> <a NAME="Cost"></a><b><font color="#FF0000">Cost Model.</font></b></li> <br><font color="#000000">There is an overhead associated with vectorization - moving data in to/out of vector registers before/after the vectorized loop, aligning of data accesses, etc. It is required to incorporate a cost model into the machine description in order to allow the vectorizer to evaluate whether it is worth while to vectorize a given loop. One can also consider using run time tests to decide which version of the loop to execute (scalar or vectorized).</font> <p><font color="#000000"><b>Status:</b> N/A</font> <br> <li> <a NAME="Induction Variable"></a><b><font color="#FF0000">Induction Variable Analysis.</font></b></li> <br><font color="#000000">Used by the vectorizer to detect loop bound, analyze access patterns and analyze data dependences between array references. One option is that the dependence tests would be designed deal with array references that are expressed in terms of a linear function of the iteration counter (in this case, vectorization will also benefit from optimizations like induction variable substitution and replacement of auxiliary IVs with linear function of the loop index). A dependence tester that is based on IVs represented in this form would analyze each subscript of each array reference, and apply the appropriate dependence test (SIV, ZIV, MIV etc., see <a href="#Dependence">dependence testing</a>). Alternatively, an induction variable evolution analyzer could provide a different implementation to the dependence tester.</font> <p><font color="#000000"><b>Status:</b> Induction Variable Analysis is under development (?). In addition, recurrence-chains based induction variable evolution analyzer is under development (?).</font> <br> <li> <a NAME="Dependence"></a><b><font color="#FF0000">Dependence Testing.</font></b></li> <br><font color="#000000">Following the classic dependence-based approach for vectorization as described in </font> <a href="#kenedy-book">#kenedy-book</a>, apply dependence tests to pairs of array references in each loop nest, and analyze the resulting dependence graph. We will s<font color="#000000">tart from a dependence analyzer that relies on the array references being expressed in terms of a linear function of the loop index, apply the simplest dependence tests to all </font>pairs of memory read/write and write/write, and g<font color="#000000">radually extend its capabilities.The scheme below follows the algorithm described in </font> <a href="#tseng">#tseng:</a> <ul> <li> <font color="#000000">Partition the array subscripts into seperable sets (subscripts which index does not occur in other subscripts)</font></li> <li> <font color="#000000">Apply the simplest tests for separable subscripts (srong SIV (single index variable) tests).</font></li> <li> <font color="#000000">Incorporate more advanced dependence tests; First, tests for separable subscripts - e.g., weak SIV tests, MIV (Multiple Index Variable) tests, followed by tests for coupled subscripts, possibly up to integer linear programming like Fourier-Motzkin elimination.</font></li> <li> <font color="#000000">Compute dependence distance, and prune dependences which distance > vector_size.</font></li> <li> <font color="#000000">Try to handle cycles in the dependence graph of the loop (by performing loop distribution, etc.).</font></li> <li> <font color="#000000">Generalize the dependence tester to nested loops</font></li> </ul> An alternative implementation may be provided by an IV evolution analyzer. <p><b>Status:</b> N/A <ul> </ul> <li> <a NAME="Access Pattern"></a><b><font color="#FF0000">Access Pattern Analysis.</font></b></li> <br><font color="#000000">The memory architecture usually allows only restricted accesses to data in memory; one of the restrictions is that the accessed data must be consecutive in memory. Any other accesses (strided for example) require to perform special permutation of the data in order to pack the data elements in the right order into a vector register. Support for different access patterns consists of the following stages:</font> <ul> <li> <font color="#000000">Access pattern analysis - Classify the access pattern of each array reference. Like the dependence analyzer, the Access pattern analyzer also relies on the array references being expressed in terms of a linear function of the loop index.</font></li> <li> <font color="#000000">Trivially handle consecutive (unit stride) access patterns (a[i])</font></li> <li> <font color="#000000">Handle strided access patterns (a[2*i]). The stride 2 acceess pattern appears in computations on complex numbers, where the real and imaginary parts are interleaved in the input/output array.</font></li> <li> <font color="#000000">Handle other types of access patterns?</font></li> <li> <font color="#000000">Support pointer arithmetic</font></li> </ul> <p><br>In order to facilitate a scheme that trivially replaces each operation with its equivalent SIMD instruction, a <b>preprocessing pass</b> may need to take place in order to convert the scalar operations into a proper (scalar) form (a similar issue come up in the <a href="#Idiom recognition">"flattening" of conditional operations</a> into virtual scalar operation); A trivial example is to convert 'a[i]=b[i]' into the sequence 't=b[i]; a[i]=t' which will be later replaced with a vector load and a vector store. Another example comes up when expressing strided data accesses; in this case, address computation operations are generated. For example, the following code sequence is generated in order to represent the expression '<i>a[2*i+1] = b[i]</i>': <ol> <li> <i>T.1 = i * 2;</i></li> <li> <i>T.2 = T.1 + 1;</i></li> <li> <i>a[T.2] = b[i];</i></li> </ol> These operations (1 and 2 in the example) determine the data permutation that will be performed, however, they do not need to be directly vectorized. It maybe required to mark such code as "related to computation of array reference X" in order to indicate that no vector code needs to be generated for it; instead, we may want to consider collapsing this code into the relevant array reference, similarly to forward substitution (but this would probably mean breaking the gimple convensions?). <p><b>Status:</b> <font color="#000000">This analysis relies on the way array references and their related address computation is represented in the gimple trees. It also relies on the ways IVs are expressed.</font> <br> <li> <a NAME="computations"></a><b><font color="#FF0000">Extend the range of supportable operations.</font></b></li> <br><font color="#000000">At first, the only computations that will be vectorized are those for which the vectorization process consists of trivially replacing each scalar operation in the loop with it's vector counterpart. This includes simple loads, stores and arithmetic operations that operate on the same data type. Some computations require extra code to be generated in order to vectorize; These include:</font> <ul> <li> <font color="#000000">Computations with mixed types; these require proper promotion/demotion between vectors of different sizes.</font></li> <li> <font color="#000000">Computations that involve loop invariants (a[i] = N) and involves scalar expansion.</font></li> <li> <font color="#000000">Computations that involve induction variables (a[i] = i); involves scalar expansion, and proper initialization and update code.</font></li> </ul> <p><br><font color="#000000"><b>Status:</b>N/A</font><!C-It may be required for each array reference, to mark all related code that is used to compute the address (the array subscript). This code will not be vectorized, but will determine what kind of permutation will be required to support the data access.> <br> <li> <a NAME="Alignment"></a><b><font color="#FF0000">Alignment.</font></b></li> <br><font color="#000000">The memory architecture usually allows only restricted accesses to data in memory; one of the restrictions is that data accesses need to be properly aligned on a certain boundary. Even if the architecture supports unaligned accesses, these are usually much more costly than aligned accesses. The work on alignment consists of several stages:</font> <ul> <li> <font color="#000000">Analysis - compute the misalignment properties of each memory access.</font></li> <li> <font color="#000000">Handling of aligned memory accesses only (do not attempt to vectorize loops that contain unaligned accesses)</font></li> <li> <font color="#000000">If it is impossible to determine at compile time whether the memory access is aligned, create two versions of the loop and use a runtime test to decide which version to execute - the original scalar version (if the data access is not aligned), or the vectorized version (if the access is aligned).</font></li> <li> <font color="#000000">Develop optimizations that increase the alignment of data accesses (static loop peeling, dynamic loop peeling, etc)</font></li> <li> <font color="#000000">Vectorize unaligned accesses. There are different ways to do that, depending on whether the target supports unaligned accesses, and also depending on what we want to implement at the tree level, and what we want to leave for the RTL level to handle. For example, one could generate the required code to support a mis-aligned memory access (for miss-aligned loads - generate an additional load and the required merging code). Alternatively, the tree level could just generate a "VEC_UNALIGNED_LOAD" insn, that will be translated later in the RTL level into the appropriate RTL sequence that handles the unaligned load (in a target dependent way). This means however that optimizations that try to reduce the merging overhead cannot take place in the tree level, where it is probably easier to do than at the RTL level.</font></li> <br> </ul> <font color="#000000"><b>Status:</b> N/A</font> <br> <li> <a NAME="Idiom recognition"></a><b><font color="#FF0000">Idiom Recognition and Conditional Execution.</font></b></li> <br><font color="#000000">It is often the case that complicated computations can be reduced into a simpler, straight-line sequnce of operation. These operations may not be directly supported in a scalar form, but are supported by the target in a vector form. Such cases include:</font> <ul> <li> <font color="#000000">Detect and support reduction operations:</font></li> <ul> <li> <font color="#000000">Detect different kinds of reduction (summation, min/max, min/max with index, etc.); once detected, these idioms may be replaced with a virtual scalar operation (to facilitate straight forward vector code generation).</font></li> <li> <font color="#000000">Vectorizing the reduction requires generating epilog code after the loop. First - generate a scalar code sequence to compute the epilog. Later, try to make use of target dependent mechanisms to compute the epilog of the reduction more efficiently.</font></li> </ul> <li> <font color="#000000">Support conditional vector execution, using masked vector operations, or predicated vector operations, or vector select operations, according to what is available in the specific target.</font></li> <li> <font color="#000000">Detect special patterns that are supported by the architecture; This may involve collapsing multi-block constructs (such as if-then-else) into a single vectorizable operation. For example, in the following code sequence (taken from the SPEC-int benchmark gzip):<br> <i> for (n = 0; n < HASH_SIZE; n++) {</i></font></li> <br><i><font color="#000000"> m = head[n];</font></i> <br><i><font color="#000000"> head[n] = (Pos)(m >= 32768 ? m-32768 : 0);</font></i> <br><i><font color="#000000"> }</font></i> <br><font color="#000000">the 'select' expression can be collpased into a 'subtract and saturate' operation (see <a href ="http://gcc.gnu.org/ml/gcc/2003-07/msg01355.html">http://gcc.gnu.org/ml/gcc/2003-07/msg01355.html</a>);</font> <li> <font color="#000000">Detect saturation idioms, and map them to the appropriate vector built-in function.</font></li> <br> </ul> <font color="#000000"><b>Status:</b> N/A</font> <br> <li> <a NAME="loop forms"></a><b><font color="#FF0000">Handle Advanced Loop Forms.</font></b></li> <ul> <li> <font color="#000000">General loop bound (unknown, or doesn't divide by the vector size)</font></li> <li> <font color="#000000">More complex forms of loop termination condition and loop index update</font></li> <li> <font color="#000000">Outer-loop vectorization (unroll and jam)</font></li> <li> <font color="#000000">Relax other CFG restrictions?</font></li> <br> </ul> <font color="#000000"><b>Status:</b> N/A</font> <br> <li> <a NAME="pointer aliasing"></a><b><font color="#FF0000">Handle Pointer Aliasing.</font></b></li> <ul> <li> <font color="#000000">Develop strong anti-aliasing analysis</font></li> <li> <font color="#000000">Generate run-time tests for cases where memory anti-aliasing cannot be resolved at compile time</font></li> <li> <font color="#000000">Support user hints?</font></li> <br> </ul> <font color="#000000"><b>Status:</b> Some aliasing infrastructure is available. points-to analysis also being developed (?).</font> <br> <li> <a NAME="Loop transform"></a><b><font color="#FF0000">Loop Transformations to Increase Vectorizability of Loops.</font></b></li> <br><font color="#000000">These include:</font> <ul> <li> <font color="#000000">loop interchange, and other unimodular transformations</font></li> <li> <font color="#000000">scalar expansion</font></li> <li> <font color="#000000">loop distribution</font></li> <li> <font color="#000000">collapsing tightly nested loops to a single loop</font></li> <br> </ul> <font color="#000000"><b>Status</b>:</font> <br> <li> <a NAME="Other Optimizations."></a><b><font color="#FF0000">Other Optimizations.</font></b></li> <ul> <li> <font color="#000000">Exploit data reuse (a la "Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures" by Shin, Chame and Hall); Mostly relies on unroll & jam having been applied.</font></li> <li> <font color="#000000">Vectorize loops that can't be vectorized using the classic vectorizer (until the proper loop transformations are developed) by applying SLP vectorization (a la "Exploiting Superword Level Parallelism with Multimedia Instruction Sets" by Amarasinghe and Larsen). This scheme can potentially more easily vectorize partially vectorizable loops or loops that are already unrolled in the source code. It is possible to implement SLP vectorization either in the tree level or at the RTL level as a complementary approach to classic loop vectorization.</font></li> <br> </ul> <font color="#000000"><b>Status</b>:</font> <br> <li> <a NAME="Other Issues"></a><b><font color="#FF0000">Other Issues to consider.</font></b></li> <ul> <li> <font color="#000000">Consider using user hints for different purposes (aliasing, alignment, profitability of vectorizing a loop...)</font></li> <li> <font color="#000000">Consider which of the items listed above should take place in the tree level, and which in the RTL level. For example, anything that could affect the decision whether to vectorize a loop, should take place at the tree level (in order to avoid having to undo transformations); Also, many of the things that can take place at the RTL level maybe simpler to implement at the tree level. However, this would mean that a lot of machine dependent stuff will have to be done at the tree level.</font></li> <li> <font color="#000000">Represent the vector operations as actual tree expression rather than built-in functions (?)</font></li> <li> <font color="#000000">Support C based languages that have SIMD or vector extensions (?)</font></li> <br> </ul> <font color="#000000"><b>Status</b>:</font></ul> <p><br><font color="#009900"><font size=+1>(Incomplete) Reference list</font></font> <ol> <li> <a NAME="kenedy-book"></a><font color="#000000">"Optimizing Compilers for Modern Architectures - A dependence based approach", Randy Allen & Ken Kennedy, Morgan Kaufmann Publishers, San Francisco, San Diego, New York (2001)</font></li> <li> <a NAME="tseng"></a><font color="#000000">"Practical Dependence Testing", Gina Goff, Ken Kennedy, Chau-Wen Tseng, CRPC+TR 90103, November 1990</font></li> </ol> <hr><i><font color="#000000">Last updated on Thu Sep 4 2003</font></i> </body> </html> (See attached file: vect_plan) Diego Novillo <dnovillo@redhat. To: Dorit Naishlos/Haifa/IBM@IBMIL com> cc: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> 09/09/2003 16:44 Subject: Re: [tree-ssa] Minor update for project web page - vectorization On Mon, 2003-09-08 at 09:18, Dorit Naishlos wrote: > This draft is currently linked through my old university account; I hope it > would be possible to include it in the gcc web pages. > Absolutely. Thanks for the thorough description! I think it would be best if we added it as a new html file linked through the main Tree SSA page. Could you prepare a patch for that? Thanks. Diego.
Attachment:
vect_plan
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |