Bug 30907 - [4.3 regression] Propagation of addresses within loops pessimizes code
Summary: [4.3 regression] Propagation of addresses within loops pessimizes code
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: Paolo Bonzini
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: missed-optimization, patch
Depends on:
Blocks: 30841
  Show dependency treegraph
 
Reported: 2007-02-21 10:36 UTC by Richard Biener
Modified: 2007-03-20 08:32 UTC (History)
3 users (show)

See Also:
Host:
Target: x86_64-*-*
Build:
Known to work: 4.2.0
Known to fail: 4.3.0
Last reconfirmed: 2007-02-22 16:48:27


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2007-02-21 10:36:52 UTC
int bar (void);
void foo (int *);
static int s[10];

void foobar (int i1, int i2, int i3, int i4, int i5, int i6)
{
  int a[100];
  int i, i7;

  i7 = bar ();
  bar ();

  for (i = 0; i < 100; i++)
    a[i] = s[i1] + s[i2] + s[i3] + s[i4] + s[i5] + s[i6] + s[i7];

  foo (&a[0]);

  return;
}

If you compare mainline to dataflow branch at -O2 you can see

--- t.i.trunk   2007-02-21 11:31:09.663252586 +0100
+++ t.i.df      2007-02-21 11:31:10.548064364 +0100
@@ -37,7 +37,6 @@
        movl    s(,%rbx,4), %edx
        addl    s(,%rcx,4), %edx
        movslq  %r12d,%r12
-       leaq    16(%rsp), %rdi
        addl    s(,%r13,4), %edx
        addl    s(,%r14,4), %edx
        addl    s(,%r15,4), %edx
@@ -46,10 +45,11 @@
        addl    s(,%r12,4), %edx
        .p2align 4,,7
 .L2:
-       movl    %edx, (%rdi,%rax,4)
+       movl    %edx, 16(%rsp,%rax,4)
        addq    $1, %rax
        cmpq    $100, %rax
        jne     .L2
+       leaq    16(%rsp), %rdi
        call    foo
        addq    $424, %rsp
        popq    %rbx

that is, we are choosing a more expensive addressing mode in the loop not
noticing that 16(%rsp) can be (G)CSEd.  This makes the above loop run
33% slower on x86_64.
Comment 1 Steven Bosscher 2007-02-21 15:53:40 UTC
On i686, this happens too, due to fwprop1:

In insn 47, replacing
 (mem/s:SI (plus:SI (plus:SI (mult:SI (reg:SI 75 [ ivtmp.37 ])
                    (const_int 4 [0x4]))
                (reg/f:SI 91))
            (const_int -4 [0xfffffffffffffffc])) [3 a S4 A8])
 with (mem/s:SI (plus:SI (plus:SI (mult:SI (reg:SI 75 [ ivtmp.37 ])
                    (const_int 4 [0x4]))
                (reg/f:SI 20 frame))
            (const_int -404 [0xfffffffffffffe6c])) [3 a S4 A8])
defering rescan insn with uid = 47.

This results in different code between trunk and the df-branch:

-- trunk
++ df-branch
@@ -12,7 +12,6 @@
        movl    %eax, %ebx
        call    bar
        movl    12(%ebp), %eax
-       leal    -404(%ebp), %ecx
        movl    s(,%eax,4), %edx
        movl    8(%ebp), %eax
        addl    s(,%eax,4), %edx
@@ -28,11 +27,12 @@
        addl    s(,%ebx,4), %edx
        .p2align 4,,7
 .L2:
-       movl    %edx, -4(%ecx,%eax,4)
+       movl    %edx, -408(%ebp,%eax,4)
        addl    $1, %eax
        cmpl    $101, %eax
        jne     .L2
-       movl    %ecx, (%esp)
+       leal    -404(%ebp), %eax
+       movl    %eax, (%esp)
        call    foo
        addl    $404, %esp
        popl    %ebx

Comment 2 Paolo Bonzini 2007-02-21 16:01:53 UTC
fwprop has some tricks to avoid propagating within loops before unrolling.  The interesting point is why they trigger differently in mainline vs. dataflow.
Comment 3 Paolo Bonzini 2007-02-22 10:20:50 UTC
This will also appear in mainline when the patch for PR30841 is applied
Comment 4 Paolo Bonzini 2007-02-22 16:48:27 UTC
Though we don't have a testcase for mainline, the bug is there too.
Comment 5 Paolo Bonzini 2007-03-20 08:31:34 UTC
Subject: Bug 30907

Author: bonzini
Date: Tue Mar 20 08:31:13 2007
New Revision: 123084

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=123084
Log:
2007-03-19  Paolo Bonzini  <bonzini@gnu.org>

	PR rtl-optimization/30907
	* fwprop.c (forward_propagate_into): Never propagate inside a loop.
	(fwprop_init): Always call loop_optimizer_initialize.
	(fwprop_done): Always call loop_optimizer_finalize.
	(fwprop): We always have loop info now.
	(gate_fwprop_addr): Remove.
	(pass_fwprop_addr): Use gate_fwprop as gate.

	PR rtl-optimization/30841
	* df-problems.c (df_ru_local_compute, df_rd_local_compute,
	df_chain_alloc): Call df_reorganize_refs unconditionally.
	* df-scan.c (df_rescan_blocks, df_reorganize_refs): Change
	refs_organized to refs_organized_size.
	(df_ref_create_structure): Use refs_organized_size instead of
	bitmap_size if refs had been organized, and keep refs_organized_size
	up-to-date.
	* df.h (struct df_ref_info): Change refs_organized to
	refs_organized_size.
	(DF_DEFS_SIZE, DF_USES_SIZE): Use refs_organized_size instead of
	bitmap_size.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/df-problems.c
    trunk/gcc/df-scan.c
    trunk/gcc/df.h
    trunk/gcc/fwprop.c

Comment 6 Paolo Bonzini 2007-03-20 08:32:13 UTC
patch committed.