[Bug tree-optimization/81740] [6/7/8 Regression] wrong code at -O3 in both 32-bit and 64-bit modes on x86_64-linux-gnu

amker at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Dec 14 15:09:00 GMT 2017


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81740

--- Comment #4 from amker at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #3)
> Testcase modified for the testsuite:
> 
> int a[8][10] = { [2][5] = 4 }, c;
> 
> int
> main ()
> {
>   short b;
>   int i, d;
>   for (b = 4; b >= 0; b--)
>     for (c = 0; c <= 6; c++)
>       a[c + 1][b + 2] = a[c][b + 1];
>   for (i = 0; i < 8; i++)
>     for (d = 0; d < 10; d++)
>       if (a[i][d] != (i == 3 && d == 6) * 4)
>         __builtin_abort ();
>   return 0;
> }

So without reversal of inner loop, the loop nest is illegal for vectorization. 
The issue is in data dependence checking of vectorizer, I believe the mentioned
revision just exposed this.  Previously the vectorization is skipped because of
unsupported memory operation.
The outer loop vectorization unrolls the outer loop into:

  for (b = 4; b >= 0; b -= 4)
  {
    for (c = 0; c <= 6; c++)
      a[c + 1][6] = a[c][5];
    for (c = 0; c <= 6; c++)
      a[c + 1][5] = a[c][4];
    for (c = 0; c <= 6; c++)
      a[c + 1][4] = a[c][3];
    for (c = 0; c <= 6; c++)
      a[c + 1][3] = a[c][2];
  }
Then four inner loops are fused into:
  for (b = 4; b >= 0; b -= 4)
  {
    for (c = 0; c <= 6; c++)
    {
      a[c + 1][6] = a[c][5];  // S1
      a[c + 1][5] = a[c][4];  // S2
      a[c + 1][4] = a[c][3];
      a[c + 1][3] = a[c][2];
    }
  }
The loop fusion needs to meet the dependence requirement.  Basically, GCC's
data dependence analyzer doesn't model deps between references in sibling
loops, but in practice, fusion requirement can be checked by analyzing all data
references after fusion, and there is no backward data dependence.
Apparently, the requirement is violated because we have backward data
dependence between references (a[c][5], a[c+1][5]) in S1/S2.

Note, if we reverse the inner loop, the outer loop would become legal for
vectorization.

As for fix, we need to enforce dep checking in vectorizer for outer loop
vectorization.  Preparing a patch now.

Thanks


More information about the Gcc-bugs mailing list