Automatic parallelization in Graphite

In GCC there already exists an auto-parallelization pass (tree-parloops.c), which is base on the lambda framework originally developed by Sebastian. Since Lambda framework is limited to some cases (e.g. triangle loops, loops with 'if' conditions), Graphite was developed to handle the loops that lambda was not able to handle .

Dependency checking in Graphite is based on polyhedral model, we have got a comparison about the dependency checking algorithm between polyhedral and lambda framework, you can find it in AutoparRelated with title 'Data dependency algorithms under polyhedral model vs. normal algorithms'.

Current Status

First step of autopar in Graphite has been done. You can trigger it by 2 flags -floop-parallelize-all -ftree-parallelize-loops=4. Both of them is needed, the first flag will trigger Graphite pass to mark loops that can be parallel and the second flag will trigger the code generation part.

Now Autopar in Graphite can handle loops:

* Common loops without dependency. e.g.

   1 for (i = 0; i < N; i++) /* This loop can be parallel but due to the limitation
   2                           in code generation part, it will not be parallel.  */
   3  for (j = 0; j < N; j++) /* This loop will be parallel.  */
   4     A[i][j] = i + j + 3;

* Loops with if statement. e.g.

   1 #define N 10000
   2 #define T 1000
   3   for (i = 0; i < N; i++)
   4     {
   5       if (i < T)
   6         /* loop i1: carried no dependency. Will be parallel.  */
   7         A[i] = A[i+T];
   8       else
   9         /* loop i2: carried dependency. Won't be parallel.  */
  10         A[i] = A[i+T+1];
  11     }

* Some of the triangle loops.

We can handle some part of the triangle loops. e.g.

   1   for (i = 1; i < N; i++)
   2     for (j = 1; j < i; j++)
   3       A[j+N] = A[j] + j;

But there are limitations:

* Only parallel the innermost.

* We can not handle all the triangle loops.

* Missing reduction support.

More details

This part will include some of the code in Graphite branch. The code mentioned below is in Graphite branch revision r150787.

Autopar in Graphite is mainly divided into two parts: Dependency checking and code generation. Dependency checking is done in Graphite pass for every loop (if this loop can be parallel, we set loop->can_be_parallel to true. Function mark_loops_parallel in graphite.c did this) and code generation is done in autopar pass (tree-parloops.c), we didn't directly call the code generation function because we may want some inter-pass optimization.

The testsuites related to autopar in Graphite is in $SRCDIR/libgomp/testsuite/libgomp.graphite/force-parallel-*.c.

Open projects

There are still some intersesting open projects in autopar.

* Heuristics for automatic parallelization.

Now we use none (little) strategy to determine if we would like to parallelize the loops without dependency (we nearly parallelize every loop without dependency), what we would like here is a good strategy to determine if we should parallelize this loop considering performance reasons. This is still at the very beginning but interesting(important if we want autopar to be used in real code).

* Parallelize non innermost loops.

This is mainly a code generation problem in autopar. Patches exist from Razya, but they have to be tested and bootstrapped.

* Handle reductions in graphite

We have to ignore or reorder dependencies depending if the operations used are associative.

Like the addition (But there are exceptions like floating point addition)

a + (b + c) = (a + b) + c

* Advanced automatic parallelization

The most interesting point.

Not only detect loops without dependencies, but reorder the code to expose such loops. There exist some external projects working on this, however gcc does not yet support this. For detailed information about this part, you could refer to:

None: Graphite/Parallelization (last edited 2009-08-17 06:57:52 by 121)