Bug 58686 - vect_get_loop_niters() fails for some loops
Summary: vect_get_loop_niters() fails for some loops
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.9.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2013-10-11 01:10 UTC by Cong Hou
Modified: 2021-12-12 10:21 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-12-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cong Hou 2013-10-11 01:10:47 UTC
Look at the following loop:


  unsigned int t = ...;
  do {
    ...
    t -= 4;
  } while (t >= 5);


When I tried to get the iteration number of this loop as an expression using vect_get_loop_niters(), it gave me the result "scev_not_known". If I changed the type of t into signed int, then I can get the result as below: 


t > 4 ? ((unsigned int) t + 4294967291) / 4 : 0


But even when t is unsigned, we should still get the result as:


t != 4 ? (t + 4294967291) / 4 : 0


I spent some time on tracking the reason why it failed to do so, and then reached the function assert_loop_rolls_lt(), in which the assumptions are built to make sure we can get the iteration number from the following formula:


(iv1->base - iv0->base + step - 1) / step


In the example above, iv1->base is t-4, iv0->base is 4 (t>=5 is t>4), and step is 4. This formula works only if


-step + 1 <= (iv1->base - iv0->base) <= MAX - step + 1

(MAX is the maximum value of the unsigned variant of type of t, and in this formula we don't have to take care of overflow.)


I think when (iv1->base - iv0->base) < -step + 1, then we can assume the number of times the back edge is taken is 0, and that is how niter->may_be_zero is built in this function. And niter->assumptions is built based on (iv1->base - iv0->base) <= MAX - step + 1. Note that we can only get the iteration number of the loop if niter->assumptions is always evaluated as true.

However, I found that the build of niter->assumptions does not involve both iv1->base and iv0->base, but only one of them. I think this is possibly a potential bug.

Further, the reason why we can get the iteration number if t is of unsigned int type is that niter->assumptions built here t-4 < MAX-3 is evaluated to true, by taking advantage of the fact that the overflow on signed int is undefined (so t-4 < MAX-3 can be converted to t < MAX+1, where MAX+1 is assumed to not overflow). But this is not working for unsigned int.

One more problem is the way how niter->may_be_zero is built. For the loop above, niter->may_be_zero I got is 4 > t - 4 - (-4 + 1), but we should make sure t-4 here does not overflow. Otherwise niter->may_be_zero is invalid. I think the function assert_loop_rolls_lt() should take care more of unsigned int types.

With this issue we cannot vectorize this loop as its iteration number is unknown.


Thank you!

Cong
Comment 1 Richard Biener 2013-10-11 07:51:17 UTC
You mention several things that sound like they may lead to wrong code.  Based
on your analysis, can you construct testcases that show a miscompile?  Easiest
would be to trick niter analysis to think a loop runs just once so it is
peeled completely.

Best have different bugreports for the (possible) wrong-code issues as this
bug starts about a missed optimization.  [versioning for niter may be
a possible solution, in case we can build proper conditions in niter->assumptions]
Comment 2 Cong Hou 2013-10-12 00:21:22 UTC
I think this issue is more like a missed optimization. 

If the iteration number can be calculated as a constant value at compile time, then the function assert_loop_rolls_lt() won't be called due to an early exit (specifically in the function number_of_iterations_lt() at the call to number_of_iterations_lt_to_ne()). That is why I could not craft a testcase showing miscompile.

A better test case is shown below:


#define N 4
void foo(int* a, unsigned int i)
{
  int j = 0;
  do
  {
    a[j++] = 0;
    i -= 4;
  }
  while (i >= N);
}


Compile it with -O3 and the produced result is using __builtin_memset() as the niter can be calculated. But if the value of N is replaced by others like 3 or 5, GCC won't optimize this loop into __builtin_memset() any more.
Comment 3 rguenther@suse.de 2013-10-13 08:22:52 UTC
congh at google dot com <gcc-bugzilla@gcc.gnu.org> wrote:
>http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58686
>
>--- Comment #2 from Cong Hou <congh at google dot com> ---
>I think this issue is more like a missed optimization. 
>
>If the iteration number can be calculated as a constant value at
>compile time,
>then the function assert_loop_rolls_lt() won't be called due to an
>early exit
>(specifically in the function number_of_iterations_lt() at the call to
>number_of_iterations_lt_to_ne()). That is why I could not craft a
>testcase
>showing miscompile.
>
>A better test case is shown below:
>
>
>#define N 4
>void foo(int* a, unsigned int i)
>{
>  int j = 0;
>  do
>  {
>    a[j++] = 0;
>    i -= 4;
>  }
>  while (i >= N);
>}
>
>
>Compile it with -O3 and the produced result is using __builtin_memset()
>as the
>niter can be calculated. But if the value of N is replaced by others
>like 3 or
>5, GCC won't optimize this loop into __builtin_memset() any more.

Yeah, the issue in general is finding a condition that ensures the loop will terminate and a formula that computes the number of iterations if that holds true. In case of wrapping arithmetic this is non-trivial and likely not all cases are implemented.

Richard.
Comment 4 Andrew Pinski 2021-12-12 10:21:27 UTC
Confirmed.

#define N 5
void foo(int* a, unsigned int i)
{
  int j = 0;
  do
  {
    a[j++] = 0;
    i -= 4;
  }
  while (i >= N);
}

---- CUT ---
LLVM can convert the above loop into a memset while GCC does not.