[RFC, PR66873] Use graphite for parloops

Mon Jul 27 05:41:00 GMT 2015

On Sun, Jul 26, 2015 at 4:21 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> I wrote an equivalent test-case in C:
> ...
> $ cat src/gcc/testsuite/gcc.dg/autopar/outer-7.c
> /* { dg-do compile } */
> /* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details
> -fdump-tree-optimized" } */
>
> void abort (void);
>
> #define N 500
>
> int
> main (void)
> {
>   int i, j;
>   int x[N][N];
>   int *y = &x[0][0];
>
>   for (i = 0; i < N; i++)
>     for (j = 0; j < N; j++)
>       /* y[i * N + j] == x[i][j].  */
>       y[i * N + j] = i + j + 3;
>
>   for (i = 0; i < N; i++)
>     for (j = 0; j < N; j++)
>       if (x[i][j] != i + j + 3)
>         abort ();
>
>   return 0;
> }
>
> /* Check that outer loop is parallelized.  */
> /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops"
> } } */
> /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
> ...
>
> With -fno-tree-loop-ivcanon to keep original iteration order we get:
> ...
> #(Data Ref:
> #  bb: 4
> #  stmt: *_15 = _17;
> #  ref: *_15;
> #  base_object: MEM[(int *)&x];
> #  Access function 0: {{0B, +, 2000}_1, +, 4}_4
> #)
> #(Data Ref:
> #  bb: 4
> #  stmt: *_15 = _17;
> #  ref: *_15;
> #  base_object: MEM[(int *)&x];
> #  Access function 0: {{0B, +, 2000}_1, +, 4}_4
> #)
>   access_fn_A: {{0B, +, 2000}_1, +, 4}_4
>   access_fn_B: {{0B, +, 2000}_1, +, 4}_4
>
>  (subscript
>   iterations_that_access_an_element_twice_in_A: [0]
>   last_conflict: scev_not_known
>   iterations_that_access_an_element_twice_in_B: [0]
>   last_conflict: scev_not_known
>   (Subscript distance: 0 ))
>   inner loop index: 0
>   loop nest: (1 4 )
>   distance_vector:   0   0
>   distance_vector:   1 -500
>   direction_vector:     =    =
>   direction_vector:     +    -
> )
>   FAILED: data dependencies exist across iterations
> ...
>
> If we replace the y[i * N + j] with x[i][j] we get instead:
> ...
> #(Data Ref:
> #  bb: 4
> #  stmt: x[i_7][j_8] = _12;
> #  ref: x[i_7][j_8];
> #  base_object: x;
> #  Access function 0: {0, +, 1}_4
> #  Access function 1: {0, +, 1}_1
> #)
> #(Data Ref:
> #  bb: 4
> #  stmt: x[i_7][j_8] = _12;
> #  ref: x[i_7][j_8];
> #  base_object: x;
> #  Access function 0: {0, +, 1}_4
> #  Access function 1: {0, +, 1}_1
> #)
>   access_fn_A: {0, +, 1}_4
>   access_fn_B: {0, +, 1}_4
>
>  (subscript
>   iterations_that_access_an_element_twice_in_A: [0]
>   last_conflict: scev_not_known
>   iterations_that_access_an_element_twice_in_B: [0]
>   last_conflict: scev_not_known
>   (Subscript distance: 0 ))
>   access_fn_A: {0, +, 1}_1
>   access_fn_B: {0, +, 1}_1
>
>  (subscript
>   iterations_that_access_an_element_twice_in_A: [0]
>   last_conflict: scev_not_known
>   iterations_that_access_an_element_twice_in_B: [0]
>   last_conflict: scev_not_known
>   (Subscript distance: 0 ))
>   inner loop index: 0
>   loop nest: (1 4 )
>   distance_vector:   0   0
>   direction_vector:     =    =
> )
>   SUCCESS: may be parallelized
> parallelizing outer loop 8
> ...

It looks like a delinearization pass could help reconstruct a two
dimension array reference, and make the Banerjee dependence test
succeed.
Note that Graphite works in this case just because the loop bounds are
statically defined: N is 500.  Now if you have N passed in as a
function parameter, Graphite would also fail to compute the
dependence, as it cannot represent "i * N", so we would also need the
delinearization pass for Graphite.
Here is a bug that I recently opened for that:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66981

Sebastian