[RFC, PR66873] Use graphite for parloops
Sebastian Pop
sebpop@gmail.com
Mon Jul 27 05:41:00 GMT 2015
On Sun, Jul 26, 2015 at 4:21 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> I wrote an equivalent test-case in C:
> ...
> $ cat src/gcc/testsuite/gcc.dg/autopar/outer-7.c
> /* { dg-do compile } */
> /* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details
> -fdump-tree-optimized" } */
>
> void abort (void);
>
> #define N 500
>
> int
> main (void)
> {
> int i, j;
> int x[N][N];
> int *y = &x[0][0];
>
> for (i = 0; i < N; i++)
> for (j = 0; j < N; j++)
> /* y[i * N + j] == x[i][j]. */
> y[i * N + j] = i + j + 3;
>
> for (i = 0; i < N; i++)
> for (j = 0; j < N; j++)
> if (x[i][j] != i + j + 3)
> abort ();
>
> return 0;
> }
>
> /* Check that outer loop is parallelized. */
> /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops"
> } } */
> /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */
> ...
>
> With -fno-tree-loop-ivcanon to keep original iteration order we get:
> ...
> #(Data Ref:
> # bb: 4
> # stmt: *_15 = _17;
> # ref: *_15;
> # base_object: MEM[(int *)&x];
> # Access function 0: {{0B, +, 2000}_1, +, 4}_4
> #)
> #(Data Ref:
> # bb: 4
> # stmt: *_15 = _17;
> # ref: *_15;
> # base_object: MEM[(int *)&x];
> # Access function 0: {{0B, +, 2000}_1, +, 4}_4
> #)
> access_fn_A: {{0B, +, 2000}_1, +, 4}_4
> access_fn_B: {{0B, +, 2000}_1, +, 4}_4
>
> (subscript
> iterations_that_access_an_element_twice_in_A: [0]
> last_conflict: scev_not_known
> iterations_that_access_an_element_twice_in_B: [0]
> last_conflict: scev_not_known
> (Subscript distance: 0 ))
> inner loop index: 0
> loop nest: (1 4 )
> distance_vector: 0 0
> distance_vector: 1 -500
> direction_vector: = =
> direction_vector: + -
> )
> FAILED: data dependencies exist across iterations
> ...
>
> If we replace the y[i * N + j] with x[i][j] we get instead:
> ...
> #(Data Ref:
> # bb: 4
> # stmt: x[i_7][j_8] = _12;
> # ref: x[i_7][j_8];
> # base_object: x;
> # Access function 0: {0, +, 1}_4
> # Access function 1: {0, +, 1}_1
> #)
> #(Data Ref:
> # bb: 4
> # stmt: x[i_7][j_8] = _12;
> # ref: x[i_7][j_8];
> # base_object: x;
> # Access function 0: {0, +, 1}_4
> # Access function 1: {0, +, 1}_1
> #)
> access_fn_A: {0, +, 1}_4
> access_fn_B: {0, +, 1}_4
>
> (subscript
> iterations_that_access_an_element_twice_in_A: [0]
> last_conflict: scev_not_known
> iterations_that_access_an_element_twice_in_B: [0]
> last_conflict: scev_not_known
> (Subscript distance: 0 ))
> access_fn_A: {0, +, 1}_1
> access_fn_B: {0, +, 1}_1
>
> (subscript
> iterations_that_access_an_element_twice_in_A: [0]
> last_conflict: scev_not_known
> iterations_that_access_an_element_twice_in_B: [0]
> last_conflict: scev_not_known
> (Subscript distance: 0 ))
> inner loop index: 0
> loop nest: (1 4 )
> distance_vector: 0 0
> direction_vector: = =
> )
> SUCCESS: may be parallelized
> parallelizing outer loop 8
> ...
It looks like a delinearization pass could help reconstruct a two
dimension array reference, and make the Banerjee dependence test
succeed.
Note that Graphite works in this case just because the loop bounds are
statically defined: N is 500. Now if you have N passed in as a
function parameter, Graphite would also fail to compute the
dependence, as it cannot represent "i * N", so we would also need the
delinearization pass for Graphite.
Here is a bug that I recently opened for that:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66981
Sebastian
More information about the Gcc-patches
mailing list