This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [GSoC] checking for the loop parallelism


On 03/08/2014 16:05, Roman Gareev wrote:
This looks very similar to what we reported to the isl mailing list. It is
definitely not the best test case for the parallelism patch. In fact, I
doubt this requires the parallelism test at all.

I've found out, that Graphite generates the expected code using the
separate option for all dimensions:

{
   for (int c1 = 0; c1 < min(n, mid); c1 += 1) {
     S_6(c1);
     S_8(c1);
   }
   for (int c1 = max(mid, 0); c1 < n; c1 += 1) {
     S_7(c1);
     S_8(c1);
   }
}

Good.

However, there are problems in vectorization of this code (I've
attached .vect files generated by versions, which use CLooG and ISL).
It seems that it is related to inappropriate type sizes. Do you know
anything about it?

No idea here. This is a question for the vectorizer people. I propose to focus on correctness first. After this is finished, we can have a look at the performance of the generated code as well as a look at possible bad interactions with the vectorizer.

You should have a look at the following test case for openmp tests:
libgomp/testsuite/libgomp.graphite

Graphite successfully passes all the tests from
libgomp/testsuite/libgomp.graphite except graphite-isl-ast-to-gimple.c
and graphite-poly.h.

If we consider the code generated by ISL and by CLooG from
force-parallel-5.c, we'll see, that they are similar:

the code generated by ISL:

for (int c1 = 0; c1 <= 1499; c1 += 1)
   S_3(c1);

for (int c1 = 0; c1 <= 497; c1 += 1)
   for (int c3 = 0; c3 <= c1; c3 += 1)
     S_7(c1, c3);

the code generated by ClooG:


for (scat_1=0;scat_1<=1499;scat_1++) {
   (scat_1);
}

for (scat_1=0;scat_1<=497;scat_1++) {
   for (scat_3=0;scat_3<=scat_1;scat_3++) {
     (scat_1,scat_3);
   }
}

the source code:

void abort (void);

#define N 500

void foo(void)
{
   int i,j;
   int A[3*N], B[3*N];

   for (i = 0; i < 3*N; i++)
     B[i] = A[i] = i;

   for (i = 1; i < N; i++)
     for (j = 1; j < i; j++)
       /* This loop carried no dependency, it fails
at code generation part.*/
       A[j+N] = A[j] + j;

   for (i = 1; i < N; i++)
     for (j = 1; j < i; j++)
       if (A[j+N] != B[j] + j)
abort();
}

int main(void)
{
   foo();

   return 0;
}

The previous implementation of dependence analysis marks âfor
(scat_1=0;scat_1<=497;scat_1++) {â as parallelizable. However, the
current dependence analysis finds must_waw dependence:

{ [i0] -> [1 + i0] : i0 >= 1 and i0 <= 496 }

Those waw dependences seem to be correct. Should even the previous analysis only mark the j-loop as parallel?

(schedule:

{ S_7[i0, i1] -> [i0] : exists (e0 = [(i0)/4294967296]: i0 >= 0 and i0
<= 497 and i1 >= 0 and 4294967296e0 <= i0 and 4294967296e0 >=
-4294967295 + i0 and 4294967296e0 <= i0 - i1 and i1 <= 498) }

dependence:

{ S_7[i0, i1] -> S_7[1 + i0, i1] : exists (e0 = [(1 + i0)/4294967296],
e1 = [(i0)/4294967296]: i1 >= 0 and i1 <= 498 and i0 >= 0 and i0 <=
496 and 4294967296e0 <= 1 + i0 - i1 and 4294967296e0 <= 1 + i0 and
4294967296e0 >= -4294967294 + i0 and i1 <= i0 and 4294967296e1 <= i0 -
i1 and 4294967296e1 >= -4294967295 + i0 and 4294967296e1 <= i0) })

Could you please advise me what can be improved in the the current analysis?

Why did you copy this code, instead of exposing the carries_deps functions
from graphite-dependences?

If I'm not mistaken, the carries_deps functions can't be called in
graphite-isl-ast-to-gimple.c. Should we add its declaration to
graphite-poly.h to allow this?

Yes.

Tobias


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]