Bug 40886 - [4.3/4.4 Regression] No loop counter reversal for simple loops anymore
Summary: [4.3/4.4 Regression] No loop counter reversal for simple loops anymore
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: regression (show other bugs)
Version: 4.4.0
: P2 normal
Target Milestone: 4.5.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2009-07-27 21:39 UTC by Andi Kleen
Modified: 2021-11-29 05:42 UTC (History)
3 users (show)

See Also:
Host: x86_64-linux
Target: x86_64-linux
Build:
Known to work: 3.4.6 4.5.0
Known to fail: 4.0.0 4.3.5 4.4.3
Last reconfirmed: 2009-07-28 11:09:27


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andi Kleen 2009-07-27 21:39:58 UTC
Given this simple program:

main()
{
        int i;
        for (i = 0; i < 10; i++) 
                f2();
}

compiled with -O2
gcc33-hammer detects that the loop index is not used in the loop body
and rewrites it to a downwards counting loop, elimitinating one instruction
(inc/cmp -> dec)

  movl    $9, %ebx
        .p2align 4,,7
.L6:
        xorl    %eax, %eax
        call    f2
        decl    %ebx
        jns     .L6
        popq    %rbx

but gcc 4.0 - 4.3 don't anymore (tried 4.1, 4.3 and 4.4):

       xorl    %ebx, %ebx
        .p2align 4,,7
.L2:
        xorl    %eax, %eax
        incl    %ebx
        call    f2
        cmpl    $10, %ebx
        jne     .L2
Comment 1 Richard Biener 2009-07-28 11:09:27 UTC
The tree optimizers canonicalize the loop to

<bb 3>:
  # i_5 = PHI <i_3(4), 0(2)>
  # ivtmp.23_1 = PHI <ivtmp.23_4(4), 10(2)>
  f2 ();
  i_3 = i_5 + 1;
  ivtmp.23_4 = ivtmp.23_1 - 1;
  if (ivtmp.23_4 != 0)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;

But then IVOPTs chooses i as the induction variable again.

Maybe a DCE pass before IVOPTs magically would solve the regression - or
simply do not consider candidates without uses?
Comment 2 Richard Biener 2009-08-04 12:30:19 UTC
GCC 4.3.4 is being released, adjusting target milestone.
Comment 3 Zdenek Dvorak 2009-08-07 08:44:20 UTC
(In reply to comment #1)
> The tree optimizers canonicalize the loop to
> 
> <bb 3>:
>   # i_5 = PHI <i_3(4), 0(2)>
>   # ivtmp.23_1 = PHI <ivtmp.23_4(4), 10(2)>
>   f2 ();
>   i_3 = i_5 + 1;
>   ivtmp.23_4 = ivtmp.23_1 - 1;
>   if (ivtmp.23_4 != 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> 
> <bb 4>:
>   goto <bb 3>;
> 
> But then IVOPTs chooses i as the induction variable again.

This is what I would expect it to do (I am somewhat surprised that 3.3 did something else).  Ivopts at the moment do not know that comparing with 0 is more efficient than comparing with any other expression.  In all other aspects, i and ivtmp.23 have the same cost, so ivopts prefers to preserve the original induction variable.

Altering determine_use_iv_cost_condition to take the cost of the comparison into account should fix this.
Comment 4 Andi Kleen 2009-08-07 08:50:33 UTC
The RTL loop optimizer does this optimization. I had to fix it a couple 
of years ago for unsigned variables.

I think the loop optimizer still does it, just the gcc 4 frontend doesn't
give it input RTL with a suitable pattern anymore.
 
Comment 5 rakdver@kam.mff.cuni.cz 2009-08-07 08:54:25 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] No loop counter reversal for simple loops anymore

> The RTL loop optimizer does this optimization. I had to fix it a couple 
> of years ago for unsigned variables.
> 
> I think the loop optimizer still does it, just the gcc 4 frontend doesn't
> give it input RTL with a suitable pattern anymore.

RTL loop optimizer only does this optimization on platforms that have a
special loop pattern (see loop-doloop.c).
Comment 6 Andi Kleen 2009-08-07 09:38:45 UTC
It worked on x86 at least
Comment 7 Steven Bosscher 2009-08-07 09:47:34 UTC
Re. comment #6: doloop never worked on x86 except for the AMD K6.  x86 does not have a doloop pattern.
Comment 8 Andi Kleen 2009-08-07 09:52:31 UTC
At least my example in the original bug description shows that the optimization
worked on gcc 3.3. If your theory doesn't explain this then your theory is wrong.
Comment 9 Sebastian Pop 2010-02-09 04:57:38 UTC
Hi,

As suggested by Zdenek, here is a patch that lowers the cost of the IV when
it is compared against zero in a condition.  The fragile part of this patch is that it
lowers the cost by a magical constant "10".  Would there be a more appropriate
way to compute the effect, or a better constant?

Thanks,
Sebastian and Changpeng Fang

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 436e6ce..5050d0c 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -4089,6 +4089,7 @@ determine_use_iv_cost_condition (struct ivopts_data *data,
   bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on;
   comp_cost elim_cost, express_cost, cost;
   bool ok;
+  tree *control_var, *bound_cst;
 
   /* Only consider real candidates.  */
   if (!cand->iv)
@@ -4110,9 +4111,17 @@ determine_use_iv_cost_condition (struct ivopts_data *data,
 
   /* Try expressing the original giv.  If it is compared with an invariant,
      note that we cannot get rid of it.  */
-  ok = extract_cond_operands (data, use->stmt, NULL, NULL, NULL, &cmp_iv);
+  ok = extract_cond_operands (data, use->stmt, &control_var, &bound_cst,
+			      NULL, &cmp_iv);
   gcc_assert (ok);
 
+  /* When the condition is a comparison of the candidate IV against
+     zero, prefer this IV.  */
+  if (integer_zerop (*bound_cst)
+      && (operand_equal_p (*control_var, cand->var_after, 0)
+	  || operand_equal_p (*control_var, cand->var_before, 0)))
+    elim_cost.cost -= 10;
+
   express_cost = get_computation_cost (data, use, cand, false,
 				       &depends_on_express, NULL);
   fd_ivopts_data = data;
Comment 10 Sebastian Pop 2010-02-09 06:00:15 UTC
Note that subtracting 1 from the cost of the candidate IV works as well for this PR's testcase and we generate this asm with the patch:

	.file	"pr40886.c"
	.text
	.p2align 4,,15
.globl main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbx
	.cfi_def_cfa_offset 16
	movl	$10, %ebx
	.cfi_offset 3, -16
	.p2align 4,,10
	.p2align 3
.L2:
	xorl	%eax, %eax
	call	f2
	subl	$1, %ebx
	jne	.L2
	popq	%rbx
	.cfi_def_cfa_offset 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (GNU) 4.5.0 20100207 (experimental)"
	.section	.note.GNU-stack,"",@progbits
Comment 11 rakdver@kam.mff.cuni.cz 2010-02-09 08:30:13 UTC
Subject: Re:  [4.3/4.4/4.5 Regression] No loop
	counter reversal for simple loops anymore

Hi,

> As suggested by Zdenek, here is a patch that lowers the cost of the IV when
> it is compared against zero in a condition.  The fragile part of this patch is
> that it
> lowers the cost by a magical constant "10".  Would there be a more appropriate
> way to compute the effect, or a better constant?

10 looks like way too much, forcing ivopts to prefer comparison with zero even
if other choice of induction variables would be better.  The constant should be
target-dependent; but unless we already have this information somewhere, I would
use 1, or even just change the complexity part of the cost (assuming that that would
work),

Zdenek
Comment 12 Sebastian Pop 2010-02-09 17:17:23 UTC
Hi,

I just checked the back-end cost tables and there is no cost entry for
compare against zero.  I guess that we should just add a TODO
comment around the code that we're adding, and then add the cost
field in GCC 4.6.
Comment 13 Sebastian Pop 2010-02-11 15:45:56 UTC
Subject: Bug 40886

Author: spop
Date: Thu Feb 11 15:45:27 2010
New Revision: 156701

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156701
Log:
Fix PR40886.

2010-02-11  Sebastian Pop  <sebastian.pop@amd.com>
	    Changpeng Fang  <changpeng.fang@amd.com>

	PR middle-end/40886
	* tree-ssa-loop-ivopts.c (determine_use_iv_cost_condition): Decrement
	the cost of an IV candidate when the IV is used in a test against zero.

	* gcc.dg/tree-ssa/ivopts-3.c: New.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ivopts-3.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-loop-ivopts.c

Comment 14 Sebastian Pop 2010-02-11 15:47:29 UTC
Fixed in trunk GCC 4.5.
Comment 15 Richard Biener 2010-05-22 18:13:40 UTC
GCC 4.3.5 is being released, adjusting target milestone.
Comment 16 Andi Kleen 2010-06-09 11:21:20 UTC
I don't need a backport to 4.4 and I double checked it works as expected
in gcc 4.5. Closing.