This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH PR81740]Enforce dependence check for outer loop vectorization
On Mon, Dec 18, 2017 at 2:35 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Mon, 18 Dec 2017, Richard Biener wrote:
>
>> where *unroll is similar to *max_vf I think. dist_v[0] is the innermost loop.
>
> [0] is always outermost loop.
>
>> The vectorizer does way more complicated things and only looks at the
>> distance with respect to the outer loop as far as I can see which can be
>> negative.
>>
>> Not sure if fusion and vectorizer "interleaving" makes a difference
>> here. I think the idea was that when interleaving stmt-by-stmt then
>> forward dependences would be preserved and thus we don't need to check
>> the inner loop dependences. speaking with "forward vs. backward"
>> dependences again, not distances...
>>
>> This also means that unroll-and-jam could be enhanced to "interleave"
>> stmts and thus cover more cases?
>>
>> Again, I hope Micha can have a look here...
>
> Haven't yet looked at the patch, but some comments anyway:
>
> fusion and interleaving interact in the following way in outer loop
> vectorization, conceptually:
> * (1) the outer loop is unrolled
> * (2) the inner loops are fused
> * (3) the (now single) inner body is rescheduled/shuffled/interleaved.
>
Thanks Michael for explaining issue clearer, this is what I meant. As
for PR60276, I think it's actually the other side of the problem,
which only relates to dependence validity of interleaving.
Thanks,
bin
> (1) is always okay. But (2) and (3) as individual transformations must
> both be checked for validity. If fusion isn't possible the whole
> transformation is invalid, and if interleaving isn't possible the same is
> true. In the specific example:
>
> for (b = 4; b >= 0; b--)
> for (c = 0; c <= 6; c++)
> t = a[c][b + 1]; // S1
> a[c + 1][b + 2] = t; // S2
>
> it's already the fusion step that's invalid. There's a
> dependence between S1 and S2, e.g. for (b,c) = (4,1) comes-before (3,0)
> with S1(4,1) reading a[1][5] and S2(3,0) writing a[1][5]. So a
> write-after-read. After fusing:
>
> for (c = 0; c <= 6; c++)
> {
> t = a[c][5]; // S1
> a[c + 1][6] = t;
> t = a[c][4];
> a[c + 1][5] = t; // S2
> a[c + 1][4] = a[c][3];
> a[c + 1][3] = a[c][2];
> }
>
> here we have at iterations (c) = (0) comes-before (1), at S2(0) writing
> a[1][5] and S1(1) writing a[1][5]. I.e. now it's a read-after-write (the
> write in iteration 0 overwrites the value that is going to be read at
> iteration 1, which wasn't the case in the original loop). The dependence
> switched direction --> invalid.
>
> The simple interleaving of statements can't rectify this.
> Interleaving is an inner-body reordering but the brokenness comes from a
> cross-iteration ordering.
>
> This example can be unroll-jammed or outer-loop vectorized if one of the
> two loops is reversed. Let's say we reverse the inner loop, so that it
> runs in the same direction as the outer loop (reversal is possible here).
>
> It'd then be something like:
>
> for (c = 6; c >= 0; c--)
> {
> t = a[c][5]; // S1
> a[c + 1][6] = t;
> t = a[c][4];
> a[c + 1][5] = t; // S2
> a[c + 1][4] = a[c][3];
> a[c + 1][3] = a[c][2];
> }
>
> The dependence between S1/S2 would still be a write-after-read, and all
> would be well. This reversal of the inner loop can partly be simulated by
> not only interleaving the inner insns, but by also _reodering_ them. But
> AFAIK the vectorizer doesn't do this?
>
>
> Ciao,
> Michael.