This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
gcc-3.4.1 vs gcc-4.2.2 performance regression in memory initialization loop
- From: "Tan, Jeffri" <jeffri dot tan at verisilicon dot com>
- To: <gcc at gnu dot org>
- Date: Sat, 5 Apr 2008 06:24:45 +0800
- Subject: gcc-3.4.1 vs gcc-4.2.2 performance regression in memory initialization loop
Apologies if this has been discussed before. I built the ARM compiler
for gcc-3.4.1 and gcc-4.2.2, and there seems to be a performance
regression. A tight loop in gcc-3.4.1 generates better code than
gcc-4.2.2.
In gcc-4.2.2, the store to the memory location of variable 'p' happens
in the loop. However, in gcc-3.4.1, 'p' is kept in a register until
after the loop when the the register is stored into the memory location
of 'p'.
Is gcc-4.2.2 being more conservative, in the possibility that p might
point to itself in the loop?
The command I used to build was:
cc1 -O2 test.c
------------------------------------------------------------------
test.c source:
int *p;
int array[400];
main() {
int i;
p=array;
for (i=0; i<400; i++) {
*p++=0;
}
}
------------------------------------------------------------------
Gcc-4.2.2 version
ldr r3, .L8
mov r2, #0
str r2, [r3], #4
ldr r0, .L8+4
str r3, [r0, #0]
@ lr needed for prologue
mov r1, #1
.L2:
ldr r2, [r0, #0]
mov r3, #0
str r3, [r2], #4
add r1, r1, #1
cmp r1, #400
str r2, [r0, #0] <==== store to 'p' inside loop
bne .L2
bx lr
.L9:
.align 2
.L8:
.word array
.word p
------------------------------------------------------------------
Gcc-3.4.1 version
ldr r3, .L10
ldr ip, .L10+4
str r3, [ip, #0]
@ lr needed for prologue
mov r0, #0
mov r1, #400
.L5:
str r0, [r3], #4
subs r1, r1, #1
mov r2, r3
bne .L5
str r2, [ip, #0] <==== store to 'p' outside of loop
mov pc, lr
.L11:
.align 2
.L10:
.word array
.word p
Thanks for any input you can provide.
Jeffri Tan