Bug 61304 - Missed vectorization: control flow in loop
Summary: Missed vectorization: control flow in loop
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 5.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2014-05-24 14:46 UTC by Marc Glisse
Modified: 2023-10-20 21:41 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2014-05-26 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2014-05-24 14:46:23 UTC
(taken from a stackoverflow question about a bug in llvm, replace -1 with -2 if you want to test llvm and avoid the bug)

gcc -O3 fails to vectorize the following program because it sees control flow in the loop. If I move i++ before the "if", which becomes i == 0, we still fail to vectorize because we get confused about the number of iterations. Finally, if I stop at i == 2048, we do vectorize, but the generated code could do with some improvements (that would be for a different PR though).

#include <stdint.h>
#include <string.h>

int main()
{
    uint32_t i = 0;
    uint32_t count = 0;

    while (1)
    {
        float n;
        memcpy(&n, &i, sizeof(float));
        if(n >= 0.0f && n <= 1.0f)
            count++;
        if (i == -1)
            break;
        i++;
    }

    return count;
}
Comment 1 Richard Biener 2014-05-26 12:30:20 UTC
if-conversion doesn't happen because the vectorizer wouldn't be happy with
the loop anyway because its latch block is not empty.  i is incremented
after the exit test.

If you write the testcase as

int main()
{
  unsigned int i = 0;
  unsigned int count = 0;

  while (1)
    {
      float n;
      __builtin_memcpy(&n, &i, sizeof(float));
      if(n >= 0.0f && n <= 1.0f)
        count++;
      i++;
      if (i == 0)
        break;
    }

  return count;
}

then it would work if the vectorizer weren't to check the 'wrong' number
of iterations to bail out for zero.

Basically, it's hard to write a loop that iterates UINT_MAX times with
an uint induction variable ;)

And yeah, I fixed a bug similar to LLVMs for 4.9 recently ... (PR59058).
It left the above missed-optimization hole.