This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3
- From: "xuepeng dot guo at intel dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 9 Feb 2009 09:16:11 -0000
- Subject: [Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3
- References: <bug-38824-12873@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #17 from xuepeng dot guo at intel dot com 2009-02-09 09:16 -------
Below is a loop in the case in its original form(compiled by GCC 4.4):
_Z7bench_1PfS_fj:
.LFB2309:
shrl $2, %edx
shufps $0, %xmm0, %xmm0
subl $1, %edx
xorl %eax, %eax
addq $1, %rdx
salq $4, %rdx
.p2align 4,,10
.p2align 3
.L11:
movaps %xmm0, %xmm1
addps (%rsi,%rax), %xmm1
movaps %xmm1, (%rdi,%rax)
addq $16, %rax
cmpq %rdx, %rax
jne .L11
rep
ret
The time is:
[xguo2@shgcc-10 38824]$ g++ 44.s -o orig.out
[xguo2@shgcc-10 38824]$ time ./orig.out
real 0m1.878s
user 0m1.877s
sys 0m0.000s
[xguo2@shgcc-10 38824]$ time ./orig.out
real 0m1.879s
user 0m1.879s
sys 0m0.001s
[xguo2@shgcc-10 38824]$ time ./orig.out
real 0m1.873s
user 0m1.872s
sys 0m0.001s
After adding two nop:
.L11:
movaps %xmm0, %xmm1
nop
nop
addps (%rsi,%rax), %xmm1
movaps %xmm1, (%rdi,%rax)
addq $16, %rax
cmpq %rdx, %rax
jne .L11
rep
ret
The time is:
[xguo2@shgcc-10 38824]$ g++ 44.s -o 2nop.out
[xguo2@shgcc-10 38824]$ time ./2nop.out
real 0m1.762s
user 0m1.762s
sys 0m0.000s
[xguo2@shgcc-10 38824]$ time ./2nop.out
real 0m1.762s
user 0m1.762s
sys 0m0.000s
[xguo2@shgcc-10 38824]$ time ./2nop.out
real 0m1.762s
user 0m1.761s
sys 0m0.000s
I suspect that the code layout maybe hurt the performance.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824