This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: gcc-3.4.1 vs gcc-4.2.2 performance regression in memory initialization loop
- From: "Richard Guenther" <richard dot guenther at gmail dot com>
- To: "Tan, Jeffri" <jeffri dot tan at verisilicon dot com>
- Cc: gcc at gnu dot org
- Date: Sat, 5 Apr 2008 14:02:54 +0200
- Subject: Re: gcc-3.4.1 vs gcc-4.2.2 performance regression in memory initialization loop
- References: <02B55F582DE21E4AA450174D1ED7E9EE0AABF2@shasxm02.verisilicon.com>
On Sat, Apr 5, 2008 at 12:24 AM, Tan, Jeffri <jeffri.tan@verisilicon.com> wrote:
>
> Apologies if this has been discussed before. I built the ARM compiler
> for gcc-3.4.1 and gcc-4.2.2, and there seems to be a performance
> regression. A tight loop in gcc-3.4.1 generates better code than
> gcc-4.2.2.
>
> In gcc-4.2.2, the store to the memory location of variable 'p' happens
> in the loop. However, in gcc-3.4.1, 'p' is kept in a register until
> after the loop when the the register is stored into the memory location
> of 'p'.
>
> Is gcc-4.2.2 being more conservative, in the possibility that p might
> point to itself in the loop?
Yes, it appearantly thinks that the store to *p can clobber p. This is
fixed with gcc 4.3.
Richard.
> The command I used to build was:
> cc1 -O2 test.c
>
> ------------------------------------------------------------------
> test.c source:
>
> int *p;
> int array[400];
>
> main() {
> int i;
> p=array;
>
> for (i=0; i<400; i++) {
> *p++=0;
> }
> }
>
> ------------------------------------------------------------------
> Gcc-4.2.2 version
> ldr r3, .L8
> mov r2, #0
> str r2, [r3], #4
> ldr r0, .L8+4
> str r3, [r0, #0]
> @ lr needed for prologue
> mov r1, #1
> .L2:
> ldr r2, [r0, #0]
> mov r3, #0
> str r3, [r2], #4
> add r1, r1, #1
> cmp r1, #400
> str r2, [r0, #0] <==== store to 'p' inside loop
> bne .L2
> bx lr
> .L9:
> .align 2
> .L8:
> .word array
> .word p
>
> ------------------------------------------------------------------
> Gcc-3.4.1 version
> ldr r3, .L10
> ldr ip, .L10+4
> str r3, [ip, #0]
> @ lr needed for prologue
> mov r0, #0
> mov r1, #400
> .L5:
> str r0, [r3], #4
> subs r1, r1, #1
> mov r2, r3
> bne .L5
> str r2, [ip, #0] <==== store to 'p' outside of loop
> mov pc, lr
> .L11:
> .align 2
> .L10:
> .word array
> .word p
>
>
> Thanks for any input you can provide.
>
> Jeffri Tan
>