This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 0/2] Loop distribution for memset zero


On Sun, Aug 1, 2010 at 07:19, Richard Guenther
<richard.guenther@gmail.com> wrote:
> Hm. ?I don't like inflation of command-line arguments too much, but

Ok, so for now let's have only -ftree-loop-distribute-patterns.

> See other responses. ?Can we detect for eample daxpy?
>
> for (i=0; i<n; ++i)
> ?dy[i] = dy[i] + da * dx[i];
>

Yes we could detect these patterns.

> ? ?In principle all the blas routines have one destination, so we'd need
> to distribute all stores, like with regular loop distribution but then
> after analyzing the partitions and detecting which ones we recognize
> we need to fuse the unhandled parts together again. ?Can we do
> that from inside the loop distribution machinery?
>

The loop distribution does not fuse back the partitions if, due to
other dependences we have to pull in the same partition more data
references than in the analysis part, and so the kernel to be code
generated is not exactly the one for which we have the lib function.

I think it would not be difficult to implement the fusion of the
partitions that do not match the patterns with the default partition.

> Ok, so we'd need to do the pattern recognition before distributing
> the loop?

We have to detect the pattern both before we create the partitions, as
this would create the initial root of the partition, and then after
the partition is created by aggregation of other data references, we
have to run the pattern matching again to make sure the pattern
matches again.

> ?But we need to make sure that the partition only contains
> side-effects the replacement function has. ?Consider
>
> ?for (i = 0; i < n ; ++i)
> ? {
> ? ? dx[i] = i;
> ? ? dy[i] = dy[i] + da * dx[i];
> ? }
>
> will loop distribution include the assignment to dx[i] in the partition
> if the worklist contains dy[i]?

I think that in this case you would get two partitions, because there
is no SCC in the data dep graph that would require the write to dx to
be in the same partition as the write to dy, and the distribution
would lead to:

for (i = 0; i < n ; ++i)
  dx[i] = i;

for (i = 0; i < n ; ++i)
  dy[i] = dy[i] + da * dx[i];

Sebastian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]