This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Unrolling addressing optimization
Hello,
> > CSE then optimizes this into the same code you describe below:
>
> > while (...)
> > {
> > i1 = i0 + 1
> > load a[i0 + 1]
> > ....
> > i2 = i0 + 2
> > load a[i0 + 2]
> > ....
> > i3 = i0 + 3
> > load a[i0 + 3]
> > ....
> > i0 = i0 + k
> > load a[i0 + k]
> >}
>
>
> It is not what our optimization does. In some architectures
> (like PowerPC), load a[i0 + k] requires two assembly instructions,
> so the above transformation is not useful.
the real-world result is this: The following code
struct blabol
{
char a; unsigned x; char b; char c; char d; char e; char f; char g; char h; char i; char j;
char k; char l; char m; char n; char o; char p; char q; char r; char s; char y; char z;
} x[100];
void foo (unsigned);
void xxx(void)
{
unsigned i;
for (i = 0; i < 100; i++)
foo (x[i].x);
}
is optimized (by the current mainline gcc with the autoinc handling in
cse removed by the attached patch, flags -O2 -funroll-loops -fweb) into
...
.L5:
lwz 3,0(31)
bl foo
lwz 3,28(31)
bl foo
lwz 3,56(31)
bl foo
lwz 3,84(31)
bl foo
lwz 3,112(31)
bl foo
lwz 3,140(31)
bl foo
lwz 3,168(31)
bl foo
lwz 3,196(31)
bl foo
lwz 3,224(31)
bl foo
lwz 3,252(31)
addi 31,31,280
bl foo
addic. 30,30,-10
bge+ 0,.L5
...
which seems to be exactly what you would like.
If we remove the extra fields from the struct blabol, the things do not
work as well:
.L5:
slwi 0,31,2
lwzx 3,30,0
bl foo
addi 0,31,1
slwi 0,0,2
lwzx 3,30,0
bl foo
addi 0,31,2
slwi 0,0,2
lwzx 3,30,0
bl foo
addi 0,31,3
slwi 0,0,2
lwzx 3,30,0
...
which is caused by the fact that strength reduction is not done for the
memory load.
Zdenek
Index: cse.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cse.c,v
retrieving revision 1.298
diff -c -3 -p -r1.298 cse.c
*** cse.c 4 Apr 2004 21:44:41 -0000 1.298
--- cse.c 11 Apr 2004 11:00:17 -0000
*************** fold_rtx (rtx x, rtx insn)
*** 4073,4078 ****
--- 4073,4079 ----
|| XEXP (y, 0) == folded_arg0)
break;
+ #if 0
/* Don't associate these operations if they are a PLUS with the
same constant and it is a power of two. These might be doable
with a pre- or post-increment. Similarly for two subtracts of
*************** fold_rtx (rtx x, rtx insn)
*** 4088,4093 ****
--- 4089,4095 ----
|| (HAVE_POST_DECREMENT
&& exact_log2 (- INTVAL (const_arg1)) >= 0)))
break;
+ #endif
/* Compute the code used to compose the constants. For example,
A-C1-C2 is A-(C1 + C2), so if CODE == MINUS, we want PLUS. */