Bug 58686 - vect_get_loop_niters() fails for some loops
vect_get_loop_niters() fails for some loops
 Status: UNCONFIRMED None gcc Unclassified tree-optimization (show other bugs) 4.9.0 P3 normal --- Not yet assigned to anyone missed-optimization vectorizer Show dependency tree / graph

 Reported: 2013-10-11 01:10 UTC by Cong Hou 2013-10-13 08:22 UTC (History) 2 users (show) rakdver rguenth

Attachments

 Note You need to log in before you can comment on or make changes to this bug.
 Cong Hou 2013-10-11 01:10:47 UTC ```Look at the following loop: unsigned int t = ...; do { ... t -= 4; } while (t >= 5); When I tried to get the iteration number of this loop as an expression using vect_get_loop_niters(), it gave me the result "scev_not_known". If I changed the type of t into signed int, then I can get the result as below: t > 4 ? ((unsigned int) t + 4294967291) / 4 : 0 But even when t is unsigned, we should still get the result as: t != 4 ? (t + 4294967291) / 4 : 0 I spent some time on tracking the reason why it failed to do so, and then reached the function assert_loop_rolls_lt(), in which the assumptions are built to make sure we can get the iteration number from the following formula: (iv1->base - iv0->base + step - 1) / step In the example above, iv1->base is t-4, iv0->base is 4 (t>=5 is t>4), and step is 4. This formula works only if -step + 1 <= (iv1->base - iv0->base) <= MAX - step + 1 (MAX is the maximum value of the unsigned variant of type of t, and in this formula we don't have to take care of overflow.) I think when (iv1->base - iv0->base) < -step + 1, then we can assume the number of times the back edge is taken is 0, and that is how niter->may_be_zero is built in this function. And niter->assumptions is built based on (iv1->base - iv0->base) <= MAX - step + 1. Note that we can only get the iteration number of the loop if niter->assumptions is always evaluated as true. However, I found that the build of niter->assumptions does not involve both iv1->base and iv0->base, but only one of them. I think this is possibly a potential bug. Further, the reason why we can get the iteration number if t is of unsigned int type is that niter->assumptions built here t-4 < MAX-3 is evaluated to true, by taking advantage of the fact that the overflow on signed int is undefined (so t-4 < MAX-3 can be converted to t < MAX+1, where MAX+1 is assumed to not overflow). But this is not working for unsigned int. One more problem is the way how niter->may_be_zero is built. For the loop above, niter->may_be_zero I got is 4 > t - 4 - (-4 + 1), but we should make sure t-4 here does not overflow. Otherwise niter->may_be_zero is invalid. I think the function assert_loop_rolls_lt() should take care more of unsigned int types. With this issue we cannot vectorize this loop as its iteration number is unknown. Thank you! Cong``` Richard Biener 2013-10-11 07:51:17 UTC ```You mention several things that sound like they may lead to wrong code. Based on your analysis, can you construct testcases that show a miscompile? Easiest would be to trick niter analysis to think a loop runs just once so it is peeled completely. Best have different bugreports for the (possible) wrong-code issues as this bug starts about a missed optimization. [versioning for niter may be a possible solution, in case we can build proper conditions in niter->assumptions]``` Cong Hou 2013-10-12 00:21:22 UTC ```I think this issue is more like a missed optimization. If the iteration number can be calculated as a constant value at compile time, then the function assert_loop_rolls_lt() won't be called due to an early exit (specifically in the function number_of_iterations_lt() at the call to number_of_iterations_lt_to_ne()). That is why I could not craft a testcase showing miscompile. A better test case is shown below: #define N 4 void foo(int* a, unsigned int i) { int j = 0; do { a[j++] = 0; i -= 4; } while (i >= N); } Compile it with -O3 and the produced result is using __builtin_memset() as the niter can be calculated. But if the value of N is replaced by others like 3 or 5, GCC won't optimize this loop into __builtin_memset() any more.``` rguenther@suse.de 2013-10-13 08:22:52 UTC ```congh at google dot com wrote: >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58686 > >--- Comment #2 from Cong Hou --- >I think this issue is more like a missed optimization. > >If the iteration number can be calculated as a constant value at >compile time, >then the function assert_loop_rolls_lt() won't be called due to an >early exit >(specifically in the function number_of_iterations_lt() at the call to >number_of_iterations_lt_to_ne()). That is why I could not craft a >testcase >showing miscompile. > >A better test case is shown below: > > >#define N 4 >void foo(int* a, unsigned int i) >{ > int j = 0; > do > { > a[j++] = 0; > i -= 4; > } > while (i >= N); >} > > >Compile it with -O3 and the produced result is using __builtin_memset() >as the >niter can be calculated. But if the value of N is replaced by others >like 3 or >5, GCC won't optimize this loop into __builtin_memset() any more. Yeah, the issue in general is finding a condition that ensures the loop will terminate and a formula that computes the number of iterations if that holds true. In case of wrapping arithmetic this is non-trivial and likely not all cases are implemented. Richard.```