This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3

From: "xuepeng dot guo at intel dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 9 Feb 2009 09:16:11 -0000
Subject: [Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3
References: <bug-38824-12873@http.gcc.gnu.org/bugzilla/>
Reply-to: gcc-bugzilla at gcc dot gnu dot org


------- Comment #17 from xuepeng dot guo at intel dot com  2009-02-09 09:16 -------
Below is a loop in the case in its original form(compiled by GCC 4.4):

_Z7bench_1PfS_fj:
.LFB2309:
        shrl    $2, %edx
        shufps  $0, %xmm0, %xmm0
        subl    $1, %edx
        xorl    %eax, %eax
        addq    $1, %rdx
        salq    $4, %rdx
        .p2align 4,,10
        .p2align 3
.L11:
        movaps  %xmm0, %xmm1       
        addps   (%rsi,%rax), %xmm1
        movaps  %xmm1, (%rdi,%rax)
        addq    $16, %rax
        cmpq    %rdx, %rax
        jne     .L11
        rep
        ret

The time is:

[xguo2@shgcc-10 38824]$ g++ 44.s -o orig.out
[xguo2@shgcc-10 38824]$ time ./orig.out

real    0m1.878s
user    0m1.877s
sys     0m0.000s
[xguo2@shgcc-10 38824]$ time ./orig.out

real    0m1.879s
user    0m1.879s
sys     0m0.001s
[xguo2@shgcc-10 38824]$ time ./orig.out

real    0m1.873s
user    0m1.872s
sys     0m0.001s

After adding two nop:

.L11:
        movaps  %xmm0, %xmm1
        nop
        nop
        addps   (%rsi,%rax), %xmm1
        movaps  %xmm1, (%rdi,%rax)
        addq    $16, %rax
        cmpq    %rdx, %rax
        jne     .L11
        rep
        ret

The time is:
[xguo2@shgcc-10 38824]$ g++ 44.s -o 2nop.out
[xguo2@shgcc-10 38824]$ time ./2nop.out

real    0m1.762s
user    0m1.762s
sys     0m0.000s
[xguo2@shgcc-10 38824]$ time ./2nop.out

real    0m1.762s
user    0m1.762s
sys     0m0.000s
[xguo2@shgcc-10 38824]$ time ./2nop.out

real    0m1.762s
user    0m1.761s
sys     0m0.000s

I suspect that the code layout maybe hurt the performance.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]