Middle-End Arrays
- why only Fortran?
- present full semantics with lead amount of IL elements
- Gimple already handles variables sized arrays by extending the VLA representation present in Gimple.
- modelling tensor calculation
- array copy propagation can be done in polyhedra to remove array copies.
- Middle-End Array Expressions
- Inspired by last years talk "premature or inevitable"
- move some of the front-end optimizations to middle-end
- decisions on function inlining at middle-end (right now front-end)
- more opts - CSE and simplifications of expressions (which is not in front-end)
- high level loop transformations, data dependence analysis.
- Applies on Fotran90+ array expressions, Fortran array expressions, middle-end array need to be multi-dimensional and variable sized.
Streamization with graphite
Are we going to generate the parallel codes directly? No. Generating call-backs to the vectorizer and parallelizer from polyhedra to Gimple do we need to have a separate pass for vectorizer after PCP to get the vectorize info form graphite?
- heuristics for PCP will be different if we do not do vectorization, parallelization or not.
- choosing heuristics - one for autopar, other for locality improvement and interaction with vectorizer. Based on these two heuristics we need to decide if we need to loop tiling, interchange, etc. or not.
- how to choose tiling sizes - right now focus on rectangular tiles only...
- how to couple this with other transformations ??
- initial aim is to generate a sequential code
- what loops will be parallelized - select a loop to parallel or not by interchanging and improving the locality and both these optimizations should be correlated.
- tiling -how important it is ?
- tiling and prefetching interaction is not yet solved
- we want tiling heuristics. how do we interchange non perfectly nested loop in gcc(?).
- interchange is more important than tiling (graphite priority). Because this will help in vectorization and prefetching.
- interchange is representative of many other transforms first step is to make loop interchange robust.
- polyhedron model is a good way to avoid a lot of corner case problems to model when it is profitable to interchange... collectively model the cost of interchanging every statement separately,
- blockwise interchange.
- some cases you need to distribute before you can interchange.
- can we use the dependency analysis of Cedric's CLAN interface to improve? yes...(it uses PIP (Parametric Integer Programming)) so getting PIP inside Graphite.
IPA optimizations and graphite (by Honza)
- function at a time, unit at a time, link time, whole program (different scopes for IPA)
- need to handle large programs, time and memory complexity of whole program similar to single file optimization.
- having incremental compilation in long term.
- stable format of object files so that libs can be shipped.
- compilation stage includes local opt., simple IPA opt.
- at IPA stage should not touch function bodies
- compile time: summary generation, summary writ.
- linktime: read the summary and IPO, opt summary write
- parallel linktime:opt summary read., transformation
- small IPA pass
- graphite needs: IPA graph and decide which func you want to apply transformations...
- being able to inline if you know additional transformations is possible...
- tracking equalities and inequalities is useful for CLooG (to see if two parameters are equal or not). This might be easier to implement than value range propagation.
- array region analysis - extend IPA PTA locations to represent subararys using polyhedra. needs (IP)value range info to form the polyhedra
- future:
- IP profile propagation
- IP alignment tracking
- data structure changes: matrix reorg, splitting up structure to help vect
- graphite:
- scalar and SSA names will be privatized.
- inlining focussed on loop nest is important
- alias analysis improvements
- symbolic analysis of scalars. (relation between parameters)
Testing
memory usage statistics polylib vs ppl.
Next Steps
- get identity transformations working. (4.4) - basic block copying should work to get this.(update_ssa)
- stabilize code -make loop interchange and blocking working (4.4)
- get rid of linear lambda (4.5)
- representing the scalar as zero dimensional arrays, reductions (4.5)
- start PCP by writing some functions that are fundamentally modular, graphite.h basically having an hypothesis that it does not depend on tree-ssa. (4.5)
- distributing cloog, cleanup the code for cloog-ppl , clean the polylib matrices from gcc(4.4)
- creating tarballs for cloog and ppl (4.4)
- need to build a committee around cloog.
- dependence analysis (4.5) - proposing ppl to use array dependence analysis. need to agree on the dependence graph (same as data dependence graph). - Konrad. (Sebastian will talk to him) But this will bind cloog to ppl - so use cloogs's abstraction. need to extend cloog to do this. This can be avoided by just using ppl directly for dependence analysis.
- extend ppl to implement parametric solvers. (right now piplib does that) (?)- Roberto how important is this. (4.5)
- implement region based scope detection (4.5)
- privatization of scalars. (4.5)
- cost model - heuristics for interchange - Harsha (4.5)
- implementation of PCP - Jan Sjodin. (4.5)
- modify cloog to consider only one type of constraint (4.5)
- loop distribution (4.5) -Sebastian
- loop fusing (4.5) - Dwarak - loop bounds need to identical, data dependence, same context but little by different domains
- unroll is a cloog issues, rerolling- Cedric
- loop composition can be done after loop transformations are done.