Automatic parallelization in Graphite
In GCC there already exists an auto-parallelization pass (tree-parloops.c), which is base on the lambda framework originally developed by Sebastian. Since Lambda framework is limited to some cases (e.g. triangle loops, loops with 'if' conditions), Graphite was developed to handle the loops that lambda was not able to handle .
Dependency checking in Graphite is based on polyhedral model, we have got a comparison about the dependency checking algorithm between polyhedral and lambda framework, you can find it in AutoparRelated with title 'Data dependency algorithms under polyhedral model vs. normal algorithms'.
First step of autopar in Graphite has been done. You can trigger it by 2 flags -floop-parallelize-all -ftree-parallelize-loops=4. Both of them is needed, the first flag will trigger Graphite pass to mark loops that can be parallel and the second flag will trigger the code generation part.
Now Autopar in Graphite can handle loops:
* Common loops without dependency. e.g.
* Loops with if statement. e.g.
* Some of the triangle loops.
We can handle some part of the triangle loops. e.g.
But there are limitations:
* Only parallel the innermost.
- We can mark all loops (from inner to outer) as parallel with dependency checking in Graphite, but due to the limitation in code generation part, we only can generate code for the innermost. Some of the test results for autopar shows that it is not a good idea to parallelize the innermost.
You can find some data shows these test results in AutoparRelated with title 'Autopar's Performance'. Currently, Razya is working on the code generation for outer loop.
* We can not handle all the triangle loops.
- We can handle some of the triangle loops, but we can not handle all. The failing is in the SCoP detection part, with error 'number of iterations unknown'. We have looked into this problem and found the reason it can't calculate the number of iterations is because the CFG in the SCoP. For a detailed discussion about this part, you can refer to
* Missing reduction support.
This part will include some of the code in Graphite branch. The code mentioned below is in Graphite branch revision r150787.
Autopar in Graphite is mainly divided into two parts: Dependency checking and code generation. Dependency checking is done in Graphite pass for every loop (if this loop can be parallel, we set loop->can_be_parallel to true. Function mark_loops_parallel in graphite.c did this) and code generation is done in autopar pass (tree-parloops.c), we didn't directly call the code generation function because we may want some inter-pass optimization.
The testsuites related to autopar in Graphite is in $SRCDIR/libgomp/testsuite/libgomp.graphite/force-parallel-*.c.
There are still some intersesting open projects in autopar.
* Heuristics for automatic parallelization.
Now we use none (little) strategy to determine if we would like to parallelize the loops without dependency (we nearly parallelize every loop without dependency), what we would like here is a good strategy to determine if we should parallelize this loop considering performance reasons. This is still at the very beginning but interesting(important if we want autopar to be used in real code).
* Parallelize non innermost loops.
This is mainly a code generation problem in autopar. Patches exist from Razya, but they have to be tested and bootstrapped.
* Handle reductions in graphite
We have to ignore or reorder dependencies depending if the operations used are associative.
Like the addition (But there are exceptions like floating point addition)
a + (b + c) = (a + b) + c
* Advanced automatic parallelization
The most interesting point.
Not only detect loops without dependencies, but reorder the code to expose such loops. There exist some external projects working on this, however gcc does not yet support this. For detailed information about this part, you could refer to: http://groups.google.de/group/gcc-graphite/browse_thread/thread/515d77136ec2f765#