Graphite phone call from 2009_02_11
Attendees: Kenny, David, Tobias, Christophe, Sebastian, Konrad, Harsha, Albert
Discussed topics:
- Performance of loop blocking and interchange on SPEC benchmarks: some kernels can benefit from enabling these transforms.
- Vectorization is too fragile and does not work on the code generated by Graphite: all the vect-* testcases fail on the graphite branch where -fgraphite-identity is on by default in -O2 and above. Konrad reported that he already saw this when he worked on the interaction of Graphite and vectorizer in IBM Haifa. One possible reason is that the code generation introduces too many basic blocks in the loop body, and vectorizer is confused, well, does not work on loops with more than 3 basic blocks. Vectorizer should be improved, or code should be put in the normal form that vectorizer expects. CFG cleanup is not enough.
- Conversion to PPL: iteration domains conversion is on the way, Tobias and Sebastian did committed a first part of the conversion. Data dependeces are next to be converted to PPL data structures.
- Loop transform interface:
- the implementation of the classic loop transforms should be on top
of the imperative syntax tree, like on the PCP's trees. Feautrier's static schedules are Dewey numbers that stand for the syntax tree: these representations are equivalent. When one speaks about loop transformations, one wants to speak about loops and sequence of statements, and not about nesting levels and integer Dewey numbers.
- scattering functions define code transformations in the polyhedral model. The scattering functions encode the static schedule with a rational number, i.e. the execution time. As this execution time is not a discrete quantity (like the Dewey numbers), it is easier to reschedule statements as it is always possible to split a rational interval without having to reschedule all the statements. The difficulty of scheduling with Dewey numbers is removed if one uses their tree representation and then convert the tree back to Dewey numbers: this amounts to work on the imperative syntax tree of the program.
- Albert proposed to look at: "POET: Parameterized Optimizations for Empirical Tuning" Qing Yi, Keith Seymour, Haihang You, Richard Vuduc, Dan Quinlan. They define a non polyhedral loop nest optimization framework. Qing Yi is from University of Texas at San Antonio, TX.
some people asked for the link to Pluto "an automatic parallelizer and locality optimizer for multicores".
- the implementation of the classic loop transforms should be on top
- Polyhedral transforms:
- computing the scattering functions for an objective function that could be "max of data reuse" or "max parallelism"
- random selection of code transform: one can select randomly the scattering functions as done by Louis-Noel Pouchet in