rtlopt loop unroller question

Zdenek Dvorak rakdver@atrey.karlin.mff.cuni.cz
Wed Oct 22 07:49:00 GMT 2003


Hello,

> The following (sent on behalf of Yossi Markovich) simple loop:
> 
> int * foo ()
> {
>   int A[N];
>   int B[N];
>   int i;
>   for (i=0; i<N; i++)
>     A[i] = B[i];
>   return A;
> }
> 
> results in much better code when compiled using "gcc3.4 -O3
> -fold-unroll-loops", than when compiled using the rltopt branch with "-O3
> -funroll-loops" (on powerpc-apple-darwin6.4). We are aware of the fact that
> the new loop optimizer in mainline is known to have caused regressions; we
> were wondering whether something can be done to get the better addressing
> calculation using the rltopt branch (possibly using a different set of
> flags?)?

as I suspected, my favourite piece of cse strikes again.  With the
patch below, the code produced is much better:

Zdenek

.text
	.align 2
	.globl _foo
_foo:
	lis r3,0xffff
	li r0,797
	ori r2,r3,14400
	stmw r25,-28(r1)
	mtctr r0
	stwux r1,r1,r2
	addi r12,r1,24
L5:
	addi r4,r12,4
	addi r2,r12,8
	addi r29,r12,12
	addi r28,r12,16
	addi r27,r12,20
	addi r26,r12,24
	addi r25,r12,28
	lwz r0,25544(r12)
	lwz r9,25544(r4)
	lwz r11,25544(r2)
	lwz r10,25544(r29)
	lwz r8,25544(r28)
	lwz r7,25544(r27)
	lwz r6,25544(r26)
	lwz r5,25544(r25)
	stw r0,8(r12)
	addi r12,r12,32
	stw r9,8(r4)
	stw r11,8(r2)
	stw r10,8(r29)
	stw r8,8(r28)
	stw r7,8(r27)
	stw r6,8(r26)
	stw r5,8(r25)
	bdnz L5
	lwz r1,0(r1)
	lmw r25,-28(r1)
	blr

Index: cse.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cse.c,v
retrieving revision 1.231.2.11
diff -c -3 -p -r1.231.2.11 cse.c
*** cse.c	20 Jul 2003 22:04:20 -0000	1.231.2.11
--- cse.c	22 Oct 2003 00:46:36 -0000
*************** fold_rtx (x, insn)
*** 4222,4227 ****
--- 4222,4228 ----
  		  || XEXP (y, 0) == folded_arg0)
  		break;
  
+ #if 0
  	      /* Don't associate these operations if they are a PLUS with the
  		 same constant and it is a power of two.  These might be doable
  		 with a pre- or post-increment.  Similarly for two subtracts of
*************** fold_rtx (x, insn)
*** 4237,4242 ****
--- 4238,4244 ----
  		      || (HAVE_POST_DECREMENT
  			  && exact_log2 (- INTVAL (const_arg1)) >= 0)))
  		break;
+ #endif
  
  	      /* Compute the code used to compose the constants.  For example,
  		 A-C1-C2 is A-(C1 + C2), so if CODE == MINUS, we want PLUS.  */



More information about the Gcc mailing list