Look at the following loop:
unsigned int t = ...;
t -= 4;
} while (t >= 5);
When I tried to get the iteration number of this loop as an expression using vect_get_loop_niters(), it gave me the result "scev_not_known". If I changed the type of t into signed int, then I can get the result as below:
t > 4 ? ((unsigned int) t + 4294967291) / 4 : 0
But even when t is unsigned, we should still get the result as:
t != 4 ? (t + 4294967291) / 4 : 0
I spent some time on tracking the reason why it failed to do so, and then reached the function assert_loop_rolls_lt(), in which the assumptions are built to make sure we can get the iteration number from the following formula:
(iv1->base - iv0->base + step - 1) / step
In the example above, iv1->base is t-4, iv0->base is 4 (t>=5 is t>4), and step is 4. This formula works only if
-step + 1 <= (iv1->base - iv0->base) <= MAX - step + 1
(MAX is the maximum value of the unsigned variant of type of t, and in this formula we don't have to take care of overflow.)
I think when (iv1->base - iv0->base) < -step + 1, then we can assume the number of times the back edge is taken is 0, and that is how niter->may_be_zero is built in this function. And niter->assumptions is built based on (iv1->base - iv0->base) <= MAX - step + 1. Note that we can only get the iteration number of the loop if niter->assumptions is always evaluated as true.
However, I found that the build of niter->assumptions does not involve both iv1->base and iv0->base, but only one of them. I think this is possibly a potential bug.
Further, the reason why we can get the iteration number if t is of unsigned int type is that niter->assumptions built here t-4 < MAX-3 is evaluated to true, by taking advantage of the fact that the overflow on signed int is undefined (so t-4 < MAX-3 can be converted to t < MAX+1, where MAX+1 is assumed to not overflow). But this is not working for unsigned int.
One more problem is the way how niter->may_be_zero is built. For the loop above, niter->may_be_zero I got is 4 > t - 4 - (-4 + 1), but we should make sure t-4 here does not overflow. Otherwise niter->may_be_zero is invalid. I think the function assert_loop_rolls_lt() should take care more of unsigned int types.
With this issue we cannot vectorize this loop as its iteration number is unknown.
You mention several things that sound like they may lead to wrong code. Based
on your analysis, can you construct testcases that show a miscompile? Easiest
would be to trick niter analysis to think a loop runs just once so it is
Best have different bugreports for the (possible) wrong-code issues as this
bug starts about a missed optimization. [versioning for niter may be
a possible solution, in case we can build proper conditions in niter->assumptions]
I think this issue is more like a missed optimization.
If the iteration number can be calculated as a constant value at compile time, then the function assert_loop_rolls_lt() won't be called due to an early exit (specifically in the function number_of_iterations_lt() at the call to number_of_iterations_lt_to_ne()). That is why I could not craft a testcase showing miscompile.
A better test case is shown below:
#define N 4
void foo(int* a, unsigned int i)
int j = 0;
a[j++] = 0;
i -= 4;
while (i >= N);
Compile it with -O3 and the produced result is using __builtin_memset() as the niter can be calculated. But if the value of N is replaced by others like 3 or 5, GCC won't optimize this loop into __builtin_memset() any more.
congh at google dot com <email@example.com> wrote:
>--- Comment #2 from Cong Hou <congh at google dot com> ---
>I think this issue is more like a missed optimization.
>If the iteration number can be calculated as a constant value at
>then the function assert_loop_rolls_lt() won't be called due to an
>(specifically in the function number_of_iterations_lt() at the call to
>number_of_iterations_lt_to_ne()). That is why I could not craft a
>A better test case is shown below:
>#define N 4
>void foo(int* a, unsigned int i)
> int j = 0;
> a[j++] = 0;
> i -= 4;
> while (i >= N);
>Compile it with -O3 and the produced result is using __builtin_memset()
>niter can be calculated. But if the value of N is replaced by others
>like 3 or
>5, GCC won't optimize this loop into __builtin_memset() any more.
Yeah, the issue in general is finding a condition that ensures the loop will terminate and a formula that computes the number of iterations if that holds true. In case of wrapping arithmetic this is non-trivial and likely not all cases are implemented.