Given this simple program: main() { int i; for (i = 0; i < 10; i++) f2(); } compiled with -O2 gcc33-hammer detects that the loop index is not used in the loop body and rewrites it to a downwards counting loop, elimitinating one instruction (inc/cmp -> dec) movl $9, %ebx .p2align 4,,7 .L6: xorl %eax, %eax call f2 decl %ebx jns .L6 popq %rbx but gcc 4.0 - 4.3 don't anymore (tried 4.1, 4.3 and 4.4): xorl %ebx, %ebx .p2align 4,,7 .L2: xorl %eax, %eax incl %ebx call f2 cmpl $10, %ebx jne .L2
The tree optimizers canonicalize the loop to <bb 3>: # i_5 = PHI <i_3(4), 0(2)> # ivtmp.23_1 = PHI <ivtmp.23_4(4), 10(2)> f2 (); i_3 = i_5 + 1; ivtmp.23_4 = ivtmp.23_1 - 1; if (ivtmp.23_4 != 0) goto <bb 4>; else goto <bb 5>; <bb 4>: goto <bb 3>; But then IVOPTs chooses i as the induction variable again. Maybe a DCE pass before IVOPTs magically would solve the regression - or simply do not consider candidates without uses?
GCC 4.3.4 is being released, adjusting target milestone.
(In reply to comment #1) > The tree optimizers canonicalize the loop to > > <bb 3>: > # i_5 = PHI <i_3(4), 0(2)> > # ivtmp.23_1 = PHI <ivtmp.23_4(4), 10(2)> > f2 (); > i_3 = i_5 + 1; > ivtmp.23_4 = ivtmp.23_1 - 1; > if (ivtmp.23_4 != 0) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > goto <bb 3>; > > But then IVOPTs chooses i as the induction variable again. This is what I would expect it to do (I am somewhat surprised that 3.3 did something else). Ivopts at the moment do not know that comparing with 0 is more efficient than comparing with any other expression. In all other aspects, i and ivtmp.23 have the same cost, so ivopts prefers to preserve the original induction variable. Altering determine_use_iv_cost_condition to take the cost of the comparison into account should fix this.
The RTL loop optimizer does this optimization. I had to fix it a couple of years ago for unsigned variables. I think the loop optimizer still does it, just the gcc 4 frontend doesn't give it input RTL with a suitable pattern anymore.
Subject: Re: [4.3/4.4/4.5 Regression] No loop counter reversal for simple loops anymore > The RTL loop optimizer does this optimization. I had to fix it a couple > of years ago for unsigned variables. > > I think the loop optimizer still does it, just the gcc 4 frontend doesn't > give it input RTL with a suitable pattern anymore. RTL loop optimizer only does this optimization on platforms that have a special loop pattern (see loop-doloop.c).
It worked on x86 at least
Re. comment #6: doloop never worked on x86 except for the AMD K6. x86 does not have a doloop pattern.
At least my example in the original bug description shows that the optimization worked on gcc 3.3. If your theory doesn't explain this then your theory is wrong.
Hi, As suggested by Zdenek, here is a patch that lowers the cost of the IV when it is compared against zero in a condition. The fragile part of this patch is that it lowers the cost by a magical constant "10". Would there be a more appropriate way to compute the effect, or a better constant? Thanks, Sebastian and Changpeng Fang diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 436e6ce..5050d0c 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -4089,6 +4089,7 @@ determine_use_iv_cost_condition (struct ivopts_data *data, bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on; comp_cost elim_cost, express_cost, cost; bool ok; + tree *control_var, *bound_cst; /* Only consider real candidates. */ if (!cand->iv) @@ -4110,9 +4111,17 @@ determine_use_iv_cost_condition (struct ivopts_data *data, /* Try expressing the original giv. If it is compared with an invariant, note that we cannot get rid of it. */ - ok = extract_cond_operands (data, use->stmt, NULL, NULL, NULL, &cmp_iv); + ok = extract_cond_operands (data, use->stmt, &control_var, &bound_cst, + NULL, &cmp_iv); gcc_assert (ok); + /* When the condition is a comparison of the candidate IV against + zero, prefer this IV. */ + if (integer_zerop (*bound_cst) + && (operand_equal_p (*control_var, cand->var_after, 0) + || operand_equal_p (*control_var, cand->var_before, 0))) + elim_cost.cost -= 10; + express_cost = get_computation_cost (data, use, cand, false, &depends_on_express, NULL); fd_ivopts_data = data;
Note that subtracting 1 from the cost of the candidate IV works as well for this PR's testcase and we generate this asm with the patch: .file "pr40886.c" .text .p2align 4,,15 .globl main .type main, @function main: .LFB0: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 movl $10, %ebx .cfi_offset 3, -16 .p2align 4,,10 .p2align 3 .L2: xorl %eax, %eax call f2 subl $1, %ebx jne .L2 popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (GNU) 4.5.0 20100207 (experimental)" .section .note.GNU-stack,"",@progbits
Subject: Re: [4.3/4.4/4.5 Regression] No loop counter reversal for simple loops anymore Hi, > As suggested by Zdenek, here is a patch that lowers the cost of the IV when > it is compared against zero in a condition. The fragile part of this patch is > that it > lowers the cost by a magical constant "10". Would there be a more appropriate > way to compute the effect, or a better constant? 10 looks like way too much, forcing ivopts to prefer comparison with zero even if other choice of induction variables would be better. The constant should be target-dependent; but unless we already have this information somewhere, I would use 1, or even just change the complexity part of the cost (assuming that that would work), Zdenek
Hi, I just checked the back-end cost tables and there is no cost entry for compare against zero. I guess that we should just add a TODO comment around the code that we're adding, and then add the cost field in GCC 4.6.
Subject: Bug 40886 Author: spop Date: Thu Feb 11 15:45:27 2010 New Revision: 156701 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156701 Log: Fix PR40886. 2010-02-11 Sebastian Pop <sebastian.pop@amd.com> Changpeng Fang <changpeng.fang@amd.com> PR middle-end/40886 * tree-ssa-loop-ivopts.c (determine_use_iv_cost_condition): Decrement the cost of an IV candidate when the IV is used in a test against zero. * gcc.dg/tree-ssa/ivopts-3.c: New. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-loop-ivopts.c
Fixed in trunk GCC 4.5.
GCC 4.3.5 is being released, adjusting target milestone.
I don't need a backport to 4.4 and I double checked it works as expected in gcc 4.5. Closing.