This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

gcc-3.4.1 vs gcc-4.2.2 performance regression in memory initialization loop


Apologies if this has been discussed before. I built the ARM compiler
for gcc-3.4.1 and gcc-4.2.2, and there seems to be a performance
regression. A tight loop in gcc-3.4.1 generates better code than
gcc-4.2.2.

In gcc-4.2.2, the store to the memory location of variable 'p' happens
in the loop. However, in gcc-3.4.1, 'p' is kept in a register until
after the loop when the the register is stored into the memory location
of 'p'.

Is gcc-4.2.2 being more conservative, in the possibility that p might
point to itself in the loop? 

The command I used to build was:
cc1 -O2 test.c

------------------------------------------------------------------
test.c source:

int *p;
int array[400];

main() {
  int i;
  p=array;

  for (i=0; i<400; i++) {
    *p++=0;
  }
}

------------------------------------------------------------------
Gcc-4.2.2 version
	ldr	r3, .L8
	mov	r2, #0
	str	r2, [r3], #4
	ldr	r0, .L8+4
	str	r3, [r0, #0]
	@ lr needed for prologue
	mov	r1, #1
.L2:
	ldr	r2, [r0, #0]
	mov	r3, #0
	str	r3, [r2], #4
	add	r1, r1, #1
	cmp	r1, #400
	str	r2, [r0, #0]	<==== store to 'p' inside loop
	bne	.L2
	bx	lr
.L9:
	.align	2
.L8:
	.word	array
	.word	p

------------------------------------------------------------------
Gcc-3.4.1 version
	ldr	r3, .L10
	ldr	ip, .L10+4
	str	r3, [ip, #0]
	@ lr needed for prologue
	mov	r0, #0
	mov	r1, #400
.L5:
	str	r0, [r3], #4
	subs	r1, r1, #1
	mov	r2, r3
	bne	.L5
	str	r2, [ip, #0]	<==== store to 'p' outside of loop
	mov	pc, lr
.L11:
	.align	2
.L10:
	.word	array
	.word	p


Thanks for any input you can provide.

Jeffri Tan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]