This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GCC performance regression - its memset!
- From: Michel LESPINASSE <walken at zoy dot org>
- To: Roger Sayle <roger at eyesopen dot com>
- Cc: gcc at gcc dot gnu dot org, Richard Henderson <rth at redhat dot com>,Jan Hubicka <jh at suse dot cz>
- Date: Mon, 22 Apr 2002 23:07:09 -0700
- Subject: Re: GCC performance regression - its memset!
- References: <Pine.LNX.4.33.0204222307450.2893-100000@www.eyesopen.com>
On Mon, Apr 22, 2002 at 11:13:09PM -0600, Roger Sayle wrote:
>
> I think its one of Jan's changes. I can reproduce the problem, and
> fix it using "-minline-all-stringops" which forces 3.1 to inline the
> memset on i686. I was concerned that it was a middle-end bug with
> builtins, but it now appears to be an ia32 back-end issue.
>
> Michel, does "-minline-all-stringops" fix the problem for you?
This option actually generates invalid code for me. Here is a test case:
------------------- cut here -----------------
#include <string.h>
short table[64];
int main (void)
{
int i;
for (i = 0; i < 64; i++)
table[i] = 1234;
memset (table, 0, 63 * sizeof(short));
return (table[63] != 0);
}
------------------- cut here -----------------
This code should return 0, however it returns 1 (compiled with -O3
-minline-all-stringops)
Here is an extract from the generated asm (the memset part of it):
movl $table, %edi
testl $1, %edi <- test 1-byte alignment (hmmm, isnt table
already two-byte aligned, being a short ?)
movl $126, %eax <- we want to clear 126 bytes
je .L7
movb $0, table
movl $table+1, %edi <- now edi is guaranteed two-byte-aligned
movl $125, %eax
.L7:
testl $2, %edi <- test 4-byte alignment
je .L8
movw $0, (%edi)
subl $2, %eax <- now edi is guaranteed four-byte-aligned
addl $2, %edi
.L8:
cld
movl %eax, %ecx
xorl %eax, %eax
shrl $2, %ecx <- number of 4-byte words remaining
rep
stosl
testl $2, %edi <- ooops, its really meant to test the remainder
not the address !!! so test will always fail.
je .L9
movw $0, (%edi)
addl $2, %edi
.L9:
testl $1, %edi <- that one too.
je .L10
movb $0, (%edi)
.L10:
2.95 was generating simpler code:
movl $table,%edi
xorl %eax,%eax
cld
movl $31,%ecx
rep
stosl
stosw
This did not take care about alignment issues, but was simpler and
actually faster on my athlon.
Hope this helps,
--
Michel "Walken" LESPINASSE
Is this the best that god can do ? Then I'm not impressed.