The polyhedral representation enabled by the Graphite project is currently limited to Static Control Parts (SCoPs). This page gives some directions on how to extend the representation for complete code. Not only the representation is essential, but also the ability to generate efficient code.
This page is in progress of being converted from the following document: PolyhedralModelExtensions.pdf
Contents
The basis
- Iteration domain representation with split domain and scheduling matrix, according to [VCP07].
- Efficient code generation according to [QRW00].
- Analysis of expressions through induction variable analysis according to [VEG01]. In this text I use the term induction variable, rather than chain of recurrences.
- Ranges of other variables (data-dependent variables) can be queried with range propagation [Har77].
Non-manifest expressions
The iteration domains in the polyhedral model are polyhedra (hence the name): they are defined by a set of equations where each equation is an affine function of domain dimensions. Such a constraint can only be generated for loops where the bounds are statically known (as a function of outer loop iterators). Such a loop is called manifest. In addition, array index expressions are represented as affine mappings of the domain dimensions to the array dimensions. These limitations can be overcome by extending the model to non-manifest, non-affine expressions.
We'll call the non-manifest, non-affine sub-expression the data-dependent part.
Representation of non-manifest or non-affine expressions
- If the data-dependent part remains constant over the modelled piece of code, treat it as a parameter.
- cfr. iterator, but implicit equality between different statements
- link to expression producing the parameter; in Graphite, this is always an SSA scalar due to three-operand notation. The write(s) to that variable can be considered independently.
- If the data-dependent part changes, treat it as an uninterpreted function symbol [PW95]. Cfr. parameter, except
- it depends on the iterators (cfr. access mapping, but always identity map)
- only equal between statements if the iterator dependence is also the same
Implicit (control) dependence on producing statement => must be included in dependence analysis. Thanks to SSA, there can be no anti-dependence from within the statement to the control expression itself.
- Probably best to do code hoisting of the expression first!
- Probably a good idea to convert the control variable to DSA, otherwise not much transformation freedom (e.g. no strip-mining of loop with data-dependent bound...)
Data-dependent index expressions
Most analyses (e.g. dependence analysis) can either project away the parameters, or exploit equality between parameters.
Data-dependent conditions
Equivalent to if-conversion [AKPW83].
In code generation, generate conditions for data-dependent constraints as soon as all loops on which the data-dependent parts depend have been generated.
Data-dependent loop bounds
No issues for modelling, it's even possible to leave out the bound completely. The problem is in code generation: you don't want to generate loops without upper bound (i.e. endless loops).
If range propagation yields worst-case bounds for the control variable, these can be used as loop bounds when generating code. However, could lead to many empty iterations.
Technique proposed in [Gei97]:
Observe that when scanning, the (upper) bound of a loop n is found as the maximum of some affine expressions. This maximum contains e.g. a parameter p(i0, in+1, ...) where i0 is an iterator which is already generated and in+1 is an iterator that has not yet been generated. The maximum value for the parameter is then actually max {p(i0, in+1, ...) | in+1 in D(i0, ...)}, i.e. the maximum over all inner loop iterations.
- If the parameter is not dependent on anything within the loop, this maximum can be computed before the loop.
- If it is dependent on a statement within the loop, the maximum can be computed iteratively. It should then be updated every time a parameter value becomes known.
- In short, an extra statement has to be generated to compute the maximum. This statement should be scheduled as soon as the parameter value becomes known.
- Lower bounds are tricky! Is it possible that the expression computing the lower bound is shifted within the loop that depends on it? This should be studied in more detail.
- A possible work-around is shifting the loop (i.e. all statements within it, regardless of whether they have this same lower bound) so that the lower bound becomes zero. However, what happens if there are two statements with different data-dependent lower bounds?
- An alternative which is certainly always correct, is adding an extra statement that updates the extreme bounds just before the data-dependent loop is entered. This creates a dependency across all outer loops, unless it is modelled as a reduction. Also the code generator should remove the statement again if it turns out not to be needed.
While loops
A while loop is any loop for which no controlling iterator can be found, i.e. it contains no induction variable or none of the induction variables appear in the exit condition.
[Gei97] again proposes a technique for while loops. It involves generating an artificial iterator which starts at zero. Its upper bound depends on itself: it is updated inside the loop. This creates a loop carried dependence, which is preferably modelled by explicitly adding a dependence rather than extracting it from an extra statement.
Irregular control flow
Irregular control flow mainly appears due to breaks, gotos and short-circuit operators in the loop control expressions. [EH94] proposes control flow restructuring, which can convert any control flow into a tree of nested loops. It involves the creation of additional variables, but these are anyway already present in the three-address SSA representation. It also adds many conditions to the code. However, the extra conditions don't need to be added explicitly, they are only used to construct the polyhedral constraints.
During code generation these extra conditions do appear. It should be possible to remove them through a combination of common subexpression elimination and control-flow optimizations.
Non-array data
Array accesses are represented by index mappings. Pointers are already converted into iterator-dependent expressions [VEG01].
We can distinguish between array data, scalar data and aliased data.
- Array data is mostly accessed through index expressions. It is represented by an array with index mappings.
- Aliased data is mostly accessed through aliases (VUSE, V_MAY_DEF, V_MUST_DEF). It should not be represented as accesses with index mappings; instead, it is more efficient to represent the SSA dependencies directly as polyhedral dependencies (see e.g. [VJBC07] for how to represent phi-nodes polyhedrally; there are probably better references but I don't know them). Don't forget anti-dependencies!
- Scalar data, like aliased data, should be represented directly out of the SSA dependencies. Alternatively, they may be represented by arrays (one for each SSA variable) with unity mapping on the write and inverse-dependency mappings on the reads.
Accesses to array data which are not through index expressions are represented by data-dependent (sub)expressions.
Data layout transformations are difficult for non-array data, because also the declaration has to be modified. ArnoutVandecappelle therefore thinks it is better not to perform data layout transformations inside the polyhedral model, but on some other representation.
Reductions
Loop transformations are sometimes unnecessarily constrained by reduction operators. This can be overcome during the dependency analysis: if an assignment is a reduction (i.e. of the form a = a op b; , where op is an associative operator like + and there is no risk of overflow), the corresponding loop-carried dependence can be removed and replaced directly by a dependence to the final result. However, the code generator must also take this into account.
Efficient polyhedral representation
With all these extensions, the polyhedral model will grow very large. An efficient representation is therefore required.
- The number of dimensions turns out not to be so critical. That's good, since we add a lot of data-dependent dimensions.
- The number of statements is very important: every statement has an iteration domain which has to be stored, and more statements means more dependencies to be checked. The number of statement grows large due to three-address notation and due to limiting iteration domains to convex polyhedra.
- Rather than splitting statements to form convex polyhedra, allow non-convex iteration domains (i.e. unions of convex polyhedra).
- During code generation, it has to be decided if the union is split or not. This is similar to the separation decision [QRW00], which also converts a union of polyhedra (due to different statement) into disjoint convex polyhedra.
If a union is not split, a condition with an or (||) will appear inside the loop.
The number of constraints is very important. This can grow very large, mainly because of the DNF representation of negations. These lead to large sets of convex polyhedra. Perhaps it's better to represent the domain as a union of polyhedra from which a union of polyhedra is subtracted, i.e. D = P \ N, where D is the iteration domain, P is the positive domain and N is the negative domain, and P and N are both unions of polyhedra.
This allows efficient representation both of not (!) expressions and else branches of conditions.
- How to implement feasibility check, Farkas lemma, projections?
- Dependencies have the most complex polyhedral representation.
- Trade-off with accuracy: keep a few extreme dependence vectors rather than the exact domain.
- Can still keep the accurate representation, but only use it for a final check of final refinement of the transformation.
References
- [AKPW83]
- J. R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. Conversion of control dependence to data dependence. In POPL '83: Proceedings of the 10th ACM SIGACT- SIGPLAN symposium on Principles of programming languages, pages 177- 189, New York, NY, USA, 1983. ACM.
- [DGCDM97]
- Eddy De Greef, Francky Catthoor, and Hugo De Man. Memory size reduction through storage order optimization for embedded parallel multimedia applications. Parallel Computing, special issue on "Parallel Processing in Multi-media" (ed. A.Krikelis), Elsevier, 23(12), December 1997.
- [EH94]
- Ana M. Erosa and Laurie J. Hendren. Taming Control Flow: A Structured Approach to Eliminating GOTO Statements. In Proc. 1994 Intnl. Conf. on Computer Languages (ICCL), pages 229- 240, May 1994.
- [Gei97]
- Max Geigl. Parallelization of loop nests with general bounds in the polyhedron model. Master's thesis, University Passau, Germany, March 1997.
- [GVB+06]
- Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. Semi- automatic composition of loop transformations for deep parallelism and memory hierarchies. Int. J. Parallel Program. , 34(3):261- 317, 2006.
- [Har77]
- W. Harrison. Compiler analysis of the value ranges for variables. IEEE Trans. Software Eng., SE-3:243- 250, May 1977.
- [NNS94]
- Chris J. Newburn, Derek B. Noonburg, and John Paul Shen. A PDG-based tool and its use in analyzing program control dependences. In PACT '94: Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques, pages 157- 168, Amsterdam, The Netherlands, The Netherlands, 1994. North- Holland Publishing Co.
- [PNDN99]
- Preeti Ranjan Panda, Hiroshi Nakamura, Nikil D. Dutt, and Alexandru Nicolau. Augmenting loop tiling with data alignment for improved cache performance. IEEE Trans. Comput. , 48(2):142- 149, 1999.
- [PSC+06]
- Sebastian Pop, Georges-André Silber, Albert Cohen, Cédric Bastoul, Sylvain Girbal, and Nicolas Vasilache. Graphite: Polyhedral analyses and optimizations for gcc. In GNU Compilers Collection Developers Summit , Ottawa, Canada, June 28- 30 2006.
- [PW95]
- W. Pugh and D. Wonnacott. Nonlinear array dependence analysis. In B. K. Szymanski and B. Sinharoy, editors, Languages, Compilers and Run-Time Systems for Scalable Computers, pages 1-14. Kluwer Academic Publishers, Boston, 1995.
- [QRW00]
- Fabien Quilleré, Sanjay Rajopadhye, and Doran Wilde. Generation of efficient nested loops from polyhedra. Int. J. Parallel Program. , 28(5):469- 498, 2000.
- [Rau92]
- B. Ramakrishna Rau. Data flow and dependence analysis for instruction level parallelism. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing , pages 236- 250, London, UK, 1992. Springer- Verlag.
- [VBJC03]
- Sven Verdoolaege, Maurice Bruynooghe, Gerda Janssens, and Francky Catthoor. Multi- dimensional incremental loop fusion for data locality. In Proc. Int. Conf. on Application- specific Systems, Architectures and Processors (ASAP), pages 17- 27, The Hague, The Netherlands, June 2003.
- [VCP07]
- Nicolas Vasilache, Albert Cohen, and Louis-Noël Pouchet. Automatic correction of loop transformations. In Proc. 16th Intnl. Conf. on Parallel Architecture and Compilation Techniques (PACT 2007) , pages 292- 304, Brasov, Romania, 15- 19 September 2007.
- [VEG01]
- Robert A. Van Engelen and Kyle A. Gallivan. An efficient algorithm for pointer- to- array access conversion for compiling and optimizing DSP applications. In Innovative Architecture for Future Generation High- Performance Processors and Systems, pages 80- 89, Maui, Hawaii, 2001.
- [VJB+03]
- Peter Vanbroekhoven, Gerda Janssens, Maurice Bruynooghe, Henk Corporaal, and Francky Catthoor. Advanced copy propagation for arrays. In Proceedings of the 2003 ACM SIGPLAN Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'03), pages 24- 33. ACM SIGPLAN, ACM Press, 2003.
- [VJBC07]
- Peter Vanbroekhoven, Gerda Janssens, Maurice Bruynooghe, and Francky Catthoor. A practical dynamic single assignment transformation. ACM Transactions on Design Automation of Electronic Systems, accepted 2007.