This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization
Hi,
On Mon, 18 Dec 2017, Richard Biener wrote:
> where *unroll is similar to *max_vf I think. dist_v[0] is the innermost loop.
[0] is always outermost loop.
> The vectorizer does way more complicated things and only looks at the
> distance with respect to the outer loop as far as I can see which can be
> negative.
>
> Not sure if fusion and vectorizer "interleaving" makes a difference
> here. I think the idea was that when interleaving stmt-by-stmt then
> forward dependences would be preserved and thus we don't need to check
> the inner loop dependences. speaking with "forward vs. backward"
> dependences again, not distances...
>
> This also means that unroll-and-jam could be enhanced to "interleave"
> stmts and thus cover more cases?
>
> Again, I hope Micha can have a look here...
Haven't yet looked at the patch, but some comments anyway:
fusion and interleaving interact in the following way in outer loop
vectorization, conceptually:
* (1) the outer loop is unrolled
* (2) the inner loops are fused
* (3) the (now single) inner body is rescheduled/shuffled/interleaved.
(1) is always okay. But (2) and (3) as individual transformations must
both be checked for validity. If fusion isn't possible the whole
transformation is invalid, and if interleaving isn't possible the same is
true. In the specific example:
for (b = 4; b >= 0; b--)
for (c = 0; c <= 6; c++)
t = a[c][b + 1]; // S1
a[c + 1][b + 2] = t; // S2
it's already the fusion step that's invalid. There's a
dependence between S1 and S2, e.g. for (b,c) = (4,1) comes-before (3,0)
with S1(4,1) reading a[1][5] and S2(3,0) writing a[1][5]. So a
write-after-read. After fusing:
for (c = 0; c <= 6; c++)
{
t = a[c][5]; // S1
a[c + 1][6] = t;
t = a[c][4];
a[c + 1][5] = t; // S2
a[c + 1][4] = a[c][3];
a[c + 1][3] = a[c][2];
}
here we have at iterations (c) = (0) comes-before (1), at S2(0) writing
a[1][5] and S1(1) writing a[1][5]. I.e. now it's a read-after-write (the
write in iteration 0 overwrites the value that is going to be read at
iteration 1, which wasn't the case in the original loop). The dependence
switched direction --> invalid.
The simple interleaving of statements can't rectify this.
Interleaving is an inner-body reordering but the brokenness comes from a
cross-iteration ordering.
This example can be unroll-jammed or outer-loop vectorized if one of the
two loops is reversed. Let's say we reverse the inner loop, so that it
runs in the same direction as the outer loop (reversal is possible here).
It'd then be something like:
for (c = 6; c >= 0; c--)
{
t = a[c][5]; // S1
a[c + 1][6] = t;
t = a[c][4];
a[c + 1][5] = t; // S2
a[c + 1][4] = a[c][3];
a[c + 1][3] = a[c][2];
}
The dependence between S1/S2 would still be a write-after-read, and all
would be well. This reversal of the inner loop can partly be simulated by
not only interleaving the inner insns, but by also _reodering_ them. But
AFAIK the vectorizer doesn't do this?
Ciao,
Michael.