Bug 115663 - outer loop vectorization with inner loop grouped access and SLP should be possible
Summary: outer loop vectorization with inner loop grouped access and SLP should be pos...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 15.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2024-06-26 11:17 UTC by Richard Biener
Modified: 2024-06-26 20:57 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-06-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2024-06-26 11:17:29 UTC
We do not support interleaving of accesses in the inner loop but SLP should
be possible if the group is contiguous with respect to the outer loop
evolution.

void foo (double * __restrict a, double *b, int n)
{
  for (int i = 0; i < 1024; ++i)
    {
      double res = a[i];
      for (int j = 0; j < 8; ++j)
        res += b[j * 16 + 2*i];
      a[i] = res;
    }
}

or

void foo (double * __restrict a, double *b, int n)
{
  for (int i = 0; i < 1024; ++i)
    {
      double res = a[i];
      for (int j = 0; j < 8; ++j)
        res += b[j * 16 + 2*i] + b[j * 16 + 2*i + 1];
      a[i] = res;
    }
}

should be possible to vectorize (the former is with a gap, the latter not).

In practice this is likely relevant for both image (pixel, w/ and w/o gap)
and complex numbers.
Comment 1 Andrew Pinski 2024-06-26 20:57:12 UTC
Confirmed.