[GSoC] checking for the loop parallelism

Sun Aug 3 14:21:00 GMT 2014

On 03/08/2014 16:05, Roman Gareev wrote:
>> This looks very similar to what we reported to the isl mailing list. It is
>> definitely not the best test case for the parallelism patch. In fact, I
>> doubt this requires the parallelism test at all.
>
> I've found out, that Graphite generates the expected code using the
> separate option for all dimensions:
>
> {
>    for (int c1 = 0; c1 < min(n, mid); c1 += 1) {
>      S_6(c1);
>      S_8(c1);
>    }
>    for (int c1 = max(mid, 0); c1 < n; c1 += 1) {
>      S_7(c1);
>      S_8(c1);
>    }
> }

Good.

> However, there are problems in vectorization of this code (I've
> attached .vect files generated by versions, which use CLooG and ISL).
> It seems that it is related to inappropriate type sizes. Do you know
> anything about it?

No idea here. This is a question for the vectorizer people. I propose to 
focus on correctness first. After this is finished, we can have a look 
at the performance of the generated code as well as a look at possible 
bad interactions with the vectorizer.

>> You should have a look at the following test case for openmp tests:
>> libgomp/testsuite/libgomp.graphite
>
> Graphite successfully passes all the tests from
> libgomp/testsuite/libgomp.graphite except graphite-isl-ast-to-gimple.c
> and graphite-poly.h.
>
> If we consider the code generated by ISL and by CLooG from
> force-parallel-5.c, we'll see, that they are similar:
>
> the code generated by ISL:
>
> for (int c1 = 0; c1 <= 1499; c1 += 1)
>    S_3(c1);
>
> for (int c1 = 0; c1 <= 497; c1 += 1)
>    for (int c3 = 0; c3 <= c1; c3 += 1)
>      S_7(c1, c3);
>
> the code generated by ClooG:
>
>
> for (scat_1=0;scat_1<=1499;scat_1++) {
>    (scat_1);
> }
>
> for (scat_1=0;scat_1<=497;scat_1++) {
>    for (scat_3=0;scat_3<=scat_1;scat_3++) {
>      (scat_1,scat_3);
>    }
> }
>
> the source code:
>
> void abort (void);
>
> #define N 500
>
> void foo(void)
> {
>    int i,j;
>    int A[3*N], B[3*N];
>
>    for (i = 0; i < 3*N; i++)
>      B[i] = A[i] = i;
>
>    for (i = 1; i < N; i++)
>      for (j = 1; j < i; j++)
>        /* This loop carried no dependency, it fails
> at code generation part.*/
>        A[j+N] = A[j] + j;
>
>    for (i = 1; i < N; i++)
>      for (j = 1; j < i; j++)
>        if (A[j+N] != B[j] + j)
> abort();
> }
>
> int main(void)
> {
>    foo();
>
>    return 0;
> }
>
> The previous implementation of dependence analysis marks â€œfor
> (scat_1=0;scat_1<=497;scat_1++) {â€œ as parallelizable. However, the
> current dependence analysis finds must_waw dependence:
>
> { [i0] -> [1 + i0] : i0 >= 1 and i0 <= 496 }

Those waw dependences seem to be correct. Should even the previous 
analysis only mark the j-loop as parallel?

> (schedule:
>
> { S_7[i0, i1] -> [i0] : exists (e0 = [(i0)/4294967296]: i0 >= 0 and i0
> <= 497 and i1 >= 0 and 4294967296e0 <= i0 and 4294967296e0 >=
> -4294967295 + i0 and 4294967296e0 <= i0 - i1 and i1 <= 498) }
>
> dependence:
>
> { S_7[i0, i1] -> S_7[1 + i0, i1] : exists (e0 = [(1 + i0)/4294967296],
> e1 = [(i0)/4294967296]: i1 >= 0 and i1 <= 498 and i0 >= 0 and i0 <=
> 496 and 4294967296e0 <= 1 + i0 - i1 and 4294967296e0 <= 1 + i0 and
> 4294967296e0 >= -4294967294 + i0 and i1 <= i0 and 4294967296e1 <= i0 -
> i1 and 4294967296e1 >= -4294967295 + i0 and 4294967296e1 <= i0) })
>
> Could you please advise me what can be improved in the the current analysis?
>
>> Why did you copy this code, instead of exposing the carries_deps functions
>> from graphite-dependences?
>
> If I'm not mistaken, the carries_deps functions can't be called in
> graphite-isl-ast-to-gimple.c. Should we add its declaration to
> graphite-poly.h to allow this?

Yes.

Tobias