Bug 49730 - loop not vectorized if inside another loop
Summary: loop not vectorized if inside another loop
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 enhancement
Target Milestone: 8.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2011-07-13 10:42 UTC by vincenzo Innocente
Modified: 2021-12-04 23:58 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work: 8.1.0, 9.1.0
Known to fail:
Last reconfirmed: 2011-07-13 11:23:16


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description vincenzo Innocente 2011-07-13 10:42:35 UTC
I've this simple double loop (used in benchmark)
the inner loop (sloop) is not vectorized when invoked inside the longer loop (dloop)

 c++ -Ofast -c vectdloop.cc -ftree-vectorizer-verbose=7
vectdloop.cc:9: note:   Profitability threshold = 6

vectdloop.cc:9: note: Profitability threshold is 6 loop iterations.
vectdloop.cc:9: note: LOOP VECTORIZED.
vectdloop.cc:7: note: vectorized 1 loops in function.

vectdloop.cc:20: note: not vectorized: unexpected loop form.
vectdloop.cc:16: note: vectorized 0 loops in function.


#include<cmath>

inline float fn(float x) {
  return 2.f*x+std::sqrt(x);
}

void sloop(float * __restrict__ s, float const * __restrict__ xx) {
  const int ls=16;
  for (int j=0; j < ls; ++j) {
    s[j] = fn(xx[j]);
  } 
}

int dloop(float yyy) {
  int niter = 100000;
  float x = 0.5f; yyy=0;
  const int ls=16;
  for (int i=0; i < niter; ++i) { 
    float s[ls]; float xx[ls];
    for (int j=0; j < ls; ++j) xx[j] =x+(5*(j&1));
    sloop(s,xx);
    // for (int j=0; j < ls; ++j)  s[j] = fn(xx[j]); 
    x += 1e-6f;
    for (int j=0; j < ls; ++j) yyy+=s[j];
  }
  if (yyy == 2.32132323232f) niter--; 
  return niter;
}
Comment 1 Richard Biener 2011-07-13 11:23:16 UTC
All inner loops are simply completely unrolled which eliminates the s array.

Then we end up with a loop with two reductions which cannot be vectorized
right now.
Comment 2 Ira Rosen 2011-07-14 06:35:39 UTC
There is no limitation on the number of reductions in vectorization.

The problem here is a non-empty latch block. There are several existing PRs for similar problems: pr 33447, pr 28643.

Ira
Comment 3 Andrew Pinski 2021-12-04 23:58:02 UTC
Fixed in GCC 8.