This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCC performance regression - its memset!


On Mon, Apr 22, 2002 at 11:13:09PM -0600, Roger Sayle wrote:
> 
> I think its one of Jan's changes.  I can reproduce the problem, and
> fix it using "-minline-all-stringops" which forces 3.1 to inline the
> memset on i686.  I was concerned that it was a middle-end bug with
> builtins, but it now appears to be an ia32 back-end issue.
> 
> Michel, does "-minline-all-stringops" fix the problem for you?

This option actually generates invalid code for me. Here is a test case:

------------------- cut here -----------------
#include <string.h>

short table[64];

int main (void)
{
    int i;

    for (i = 0; i < 64; i++)
        table[i] = 1234;

    memset (table, 0, 63 * sizeof(short));

    return (table[63] != 0);
}
------------------- cut here -----------------

This code should return 0, however it returns 1 (compiled with -O3
-minline-all-stringops)

Here is an extract from the generated asm (the memset part of it):
        movl    $table, %edi
        testl   $1, %edi       <- test 1-byte alignment (hmmm, isnt table
                                  already two-byte aligned, being a short ?)
        movl    $126, %eax     <- we want to clear 126 bytes
        je      .L7
        movb    $0, table
        movl    $table+1, %edi <- now edi is guaranteed two-byte-aligned
        movl    $125, %eax
.L7:
        testl   $2, %edi       <- test 4-byte alignment
        je      .L8
        movw    $0, (%edi)
        subl    $2, %eax       <- now edi is guaranteed four-byte-aligned
        addl    $2, %edi
.L8:
        cld
        movl    %eax, %ecx
        xorl    %eax, %eax
        shrl    $2, %ecx       <- number of 4-byte words remaining
        rep
        stosl
        testl   $2, %edi       <- ooops, its really meant to test the remainder
                                  not the address !!! so test will always fail.
        je      .L9
        movw    $0, (%edi)
        addl    $2, %edi
.L9:
        testl   $1, %edi       <- that one too.
        je      .L10
        movb    $0, (%edi)
.L10:


2.95 was generating simpler code:
        movl $table,%edi
        xorl %eax,%eax
        cld
        movl $31,%ecx
        rep
        stosl
        stosw

This did not take care about alignment issues, but was simpler and
actually faster on my athlon.

Hope this helps,

-- 
Michel "Walken" LESPINASSE
Is this the best that god can do ? Then I'm not impressed.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]