This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
new unroller vs ppc
- From: Dale Johannesen <dalej at apple dot com>
- To: Zdenek Dvorak <rakdver at atrey dot karlin dot mff dot cuni dot cz>, gcc at gcc dot gnu dot org
- Cc: Dale Johannesen <dalej at apple dot com>
- Date: Thu, 20 Mar 2003 17:41:05 -0800
- Subject: new unroller vs ppc
fyi, the new loop unroller doesn't seem to work very well on ppc:
int a[100];
int foo() {
int i;
for ( i=0; i<100; i++ )
a[i] = 6;}
(3.3 compiler)
(each of the load/update insns is dependent on the previous one, so
you can only issue one per cycle. This is not so good.)
L29:
stw r0,0(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
addi r9,r9,4
bdnz L29
(3.4 -fold-unroll-loops)
(This is much better. 25 times seems like a bit much,
but otherwise this is optimal. I know, I can tell it to unroll less.)
L58:
stw r0,0(r9)
stw r0,4(r9)
stw r0,8(r9)
stw r0,12(r9)
stw r0,16(r9)
stw r0,20(r9)
stw r0,24(r9)
stw r0,28(r9)
stw r0,32(r9)
stw r0,36(r9)
stw r0,40(r9)
stw r0,44(r9)
stw r0,48(r9)
stw r0,52(r9)
stw r0,56(r9)
stw r0,60(r9)
stw r0,64(r9)
stw r0,68(r9)
stw r0,72(r9)
stw r0,76(r9)
stw r0,80(r9)
stw r0,84(r9)
stw r0,88(r9)
stw r0,92(r9)
stw r0,96(r9)
addi r9,r9,100
bdnz L58
(3.4 -funroll-all-loops)
(it will not unroll at all with -funroll-loops)
b L8
L19:
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdz L17
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdz L17
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdz L17
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdz L17
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdz L17
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
bdz L17
stwx r11,r10,r0
bdz L17
L8:
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdnz L19
(3.4 -fno-branch-count-reg)
L5:
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
cmpwi cr7,r9,99
stwx r11,r10,r0
ble+ cr7,L5