This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Unrolling addressing optimization


Hello,

> > CSE then optimizes this into the same code you describe below:
> 
> > while (...)
> > {
> >   i1 = i0 + 1
> >   load a[i0 + 1]
> >   ....
> >   i2 = i0 + 2
> >   load a[i0 + 2]
> >   ....
> >   i3 = i0 + 3
> >   load a[i0 + 3]
> >   ....
> >   i0 = i0 + k
> >   load a[i0 + k]
> >}
> 
> 
> It is not what our optimization does. In some architectures 
> (like PowerPC), load a[i0 + k] requires two assembly instructions, 
> so the above transformation is not useful. 

the real-world result is this: The following code

struct blabol
{
  char a; unsigned x; char b; char c; char d; char e; char f; char g; char h; char i; char j;
  char k; char l; char m; char n; char o; char p; char q; char r; char s; char y; char z;
} x[100];

void foo (unsigned);

void xxx(void)
{
  unsigned i;

  for (i = 0; i < 100; i++)
    foo (x[i].x);
}

is optimized (by the current mainline gcc with the autoinc handling in
cse removed by the attached patch, flags -O2 -funroll-loops -fweb) into

...
.L5:
	lwz 3,0(31)
	bl foo
	lwz 3,28(31)
	bl foo
	lwz 3,56(31)
	bl foo
	lwz 3,84(31)
	bl foo
	lwz 3,112(31)
	bl foo
	lwz 3,140(31)
	bl foo
	lwz 3,168(31)
	bl foo
	lwz 3,196(31)
	bl foo
	lwz 3,224(31)
	bl foo
	lwz 3,252(31)
	addi 31,31,280
	bl foo
	addic. 30,30,-10
	bge+ 0,.L5
...

which seems to be exactly what you would like.

If we remove the extra fields from the struct blabol, the things do not
work as well:

.L5:
        slwi 0,31,2
        lwzx 3,30,0
        bl foo
        addi 0,31,1
        slwi 0,0,2
        lwzx 3,30,0
        bl foo
        addi 0,31,2
        slwi 0,0,2
        lwzx 3,30,0
        bl foo
        addi 0,31,3
        slwi 0,0,2
        lwzx 3,30,0
...

which is caused by the fact that strength reduction is not done for the
memory load.

Zdenek

Index: cse.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cse.c,v
retrieving revision 1.298
diff -c -3 -p -r1.298 cse.c
*** cse.c	4 Apr 2004 21:44:41 -0000	1.298
--- cse.c	11 Apr 2004 11:00:17 -0000
*************** fold_rtx (rtx x, rtx insn)
*** 4073,4078 ****
--- 4073,4079 ----
  		  || XEXP (y, 0) == folded_arg0)
  		break;
  
+ #if 0
  	      /* Don't associate these operations if they are a PLUS with the
  		 same constant and it is a power of two.  These might be doable
  		 with a pre- or post-increment.  Similarly for two subtracts of
*************** fold_rtx (rtx x, rtx insn)
*** 4088,4093 ****
--- 4089,4095 ----
  		      || (HAVE_POST_DECREMENT
  			  && exact_log2 (- INTVAL (const_arg1)) >= 0)))
  		break;
+ #endif
  
  	      /* Compute the code used to compose the constants.  For example,
  		 A-C1-C2 is A-(C1 + C2), so if CODE == MINUS, we want PLUS.  */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]