Gcc fully unrolls short trip counted (known) loop if the unrolled loop body size d oes not exceed a threshold. However, if the loop has early exit, this is not done -- leading to missing scalar opt later on. Example: int a[100]; int b[100]; int foo(void) { int i, j; for (i = 0; i < 5; i ++) { a[2*i] += a[i]; if (a[2*i] == 10) break; } return 0; }
Did you forget to specify -funroll-loops? gcc-4.2 -O2 -funroll-loops and gcc-4.4 as well as 5.0 with these options basically generate this optimized dump, which IIUC is what you want. ;; Function foo (foo) Analyzing Edge Insertions. foo () { int temp.34; int temp.31; int temp.29; int temp.26; int temp.25; <bb 2>: temp.25 = a[0]; temp.26 = temp.25 + temp.25; a[0] = temp.26; if (temp.26 == 10) goto <bb 7>; else goto <bb 3>; <bb 3>: temp.29 = a[2] + a[1]; a[2] = temp.29; if (temp.29 == 10) goto <bb 7>; else goto <bb 4>; <bb 4>: temp.31 = temp.29 + a[4]; a[4] = temp.31; if (temp.31 == 10) goto <bb 7>; else goto <bb 5>; <bb 5>: temp.34 = a[6] + a[3]; a[6] = temp.34; if (temp.34 == 10) goto <bb 7>; else goto <bb 6>; <bb 6>: a[8] = [plus_expr] a[8] + a[4]; <bb 7>: return 0; }
Currently we get at -O2: Estimating sizes for loop 1 BB: 3, after_exit: 0 size: 1 _1 = i_13 * 2; Induction variable computation will be folded away. size: 1 _2 = a[_1]; size: 1 _3 = a[i_13]; size: 1 _4 = _2 + _3; size: 1 a[_1] = _4; size: 2 if (_4 == 10) BB: 6, after_exit: 1 BB: 4, after_exit: 0 size: 1 i_10 = i_13 + 1; Induction variable computation will be folded away. size: 1 ivtmp_7 = ivtmp_12 - 1; Induction variable computation will be folded away. size: 2 if (ivtmp_7 != 0) Exit condition will be eliminated in peeled copies. Exit condition will be eliminated in last copy. Constant conditional. size: 11-5, last_iteration: 11-5 Loop size: 11 Estimated size after unrolling: 20 Not unrolling loop 1: size would grow. We do unroll at -O3.