This one was originally reported here: http://gcc.gnu.org/ml/gcc-help/2013-08/msg00124.html The original example was: #include <stdio.h> template <typename T> inline T const& max (T const& a, T const& b) { return a < b ? b : a; } int main() { long long unsigned sum = 0; for (int x = 1; x <= 100000000; x++) sum += max (x, x + 1); printf("%llu\n", sum); } It seems that GCC 4.7 was able to evaluate the loop at compile time and reduce it to a constant value, but GCC 4.8 fails to do so. I've also briefly checked with trunk rev 201282 and the problem seems to be still there. Here is a reduced test case: int test (void) { int sum = 0; for (int x = 0; x < 100; x++) sum += x; return sum; } I've checked this with an SH cross compiler setup, but I don't think it matters. The loops do get eliminated if the number of loop iterations is max. 17, for both the reduced example and the originally reported case.
(In reply to Oleg Endo from comment #0) > > I've checked this with an SH cross compiler setup, but I don't think it > matters. > The loops do get eliminated if the number of loop iterations is max. 17, for > both the reduced example and the originally reported case. Forgot to mention: The loops get eliminated when compiling with -O3
Using this build of 4.8.1: $ /usr/local/gcc-4.8.1/bin/gcc -v Using built-in specs. COLLECT_GCC=/usr/local/gcc-4.8.1/bin/gcc COLLECT_LTO_WRAPPER=/usr/local/gcc-4.8.1/libexec/gcc/x86_64-unknown-linux-gnu/4.8.1/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.8.1/configure --prefix=/usr/local/gcc-4.8.1 --with-local-prefix=/usr/local/gcc-4.8.1/local --enable-languages=c,c++ --enable-threads --disable-multilib Thread model: posix gcc version 4.8.1 (GCC) With this test case (thanks for the nudge, Oleg!) #include <stdint.h> uint64_t test() { uint64_t rv = 0; uint32_t i; for ( i = 0; i < ITERS; ++i ) rv += i; return rv; } I did a quick loop to find where it switched from precomputing to looping. Interestingly, it was at 71 (precomputed) to 72 (loop) iterations: $ for i in {1..100} ; do /usr/local/gcc-4.8.1/bin/gcc -O3 -S test.c -DITERS=$i ; if grep -q jne test.s ; then echo "$i: loop" ; else echo "$i: precomputed" ; fi ; done 1: precomputed 2: precomputed 3: precomputed 4: precomputed 5: precomputed ... 70: precomputed 71: precomputed 72: loop 73: loop Curious whether the 17/71 difference from Oleg's SH environment is a typo/transposition, or if it's a difference in native word size, or something else. For whatever it's worth, this bug is not something that is affecting me; it was just a question brought up on the IRC channel. So I don't have any priority to assign to it, other than that it seems a regression in quality-of-implementation since 4.7. (Although, as I asked in my original e-mail, it might be that 4.8 is just more cautious about looping, so the correct flags could restore the desired performance.) Thanks, Tony
Looking at the tree dump (-fdump-tree-all -fdump-tree-all-details), it seems that this is related to loop unrolling (dump file *t.cunroll) and induction variable optimization. If the number of iterations is small enough the loop gets unrolled and then the calculations on 'sum' are folded away. For this one: int test2 (void) { int sum = 0; for (int x = 0; x < 5; x++) sum += x; return sum; } t.cunroll shows: ;; Function int test2() (_Z5test2v, funcdef_no=0, decl_uid=1593, symbol_order=0) int test2() () { int x; int sum; <bb 2>: return 10; } However, with this one: int test2 (void) { int sum = 0; for (int x = 0; x < 6; x++) sum += x; return sum; } The unrolled result is: int test2() () { int x; int sum; unsigned int ivtmp_2; unsigned int ivtmp_14; unsigned int ivtmp_20; unsigned int ivtmp_26; unsigned int ivtmp_32; unsigned int ivtmp_38; <bb 2>: sum_12 = 0; x_13 = 1; ivtmp_14 = 5; sum_18 = sum_12 + x_13; x_19 = x_13 + 1; ivtmp_20 = ivtmp_14 + 4294967295; sum_24 = sum_18 + x_19; x_25 = x_19 + 1; ivtmp_26 = ivtmp_20 + 4294967295; sum_30 = sum_24 + x_25; x_31 = x_25 + 1; ivtmp_32 = ivtmp_26 + 4294967295; sum_36 = sum_30 + x_31; x_37 = x_31 + 1; ivtmp_38 = ivtmp_32 + 4294967295; sum_3 = sum_36 + x_37; x_4 = x_37 + 1; ivtmp_2 = ivtmp_38 + 4294967295; return sum_3; }
(In reply to Oleg Endo from comment #0) > > I've also briefly checked with trunk rev 201282 and the problem seems to be > still there. Here is a reduced test case: > > int test (void) > { > int sum = 0; > for (int x = 0; x < 100; x++) > sum += x; > > return sum; > } > As of r213381 the reduced test case seems to work OK with at least loop counts up to 40000.
This issue seems to be working just fine. Not sure what kind of test case to add for this though... just scanning final assembler code for some expected hex or dec constant?
Dup of bug 57511. There is a testcase even now for this issue here so it won't regress again. *** This bug has been marked as a duplicate of bug 57511 ***