This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3
- From: "xuepeng dot guo at intel dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 11 Feb 2009 07:37:02 -0000
- Subject: [Bug target/38824] [4.4 Regression] performance regression of sse code from 4.2/4.3
- References: <bug-38824-12873@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #22 from xuepeng dot guo at intel dot com 2009-02-11 07:37 -------
(In reply to comment #18)
> Xuepeng, can you test with the loop as produced by my posted patch, that is:
> .L11:
> movaps (%rsi,%rax), %xmm0
> addps %xmm1, %xmm0
> movaps %xmm0, (%rdi,%rax)
> addq $16, %rax
> cmpq %rdx, %rax
> jne .L11
> I don't have access to new enough chips.
Your patch improved the performance. My machine is "Intel(R) Core(TM)2 Quad CPU
Q6700 @ 2.66GHz". The results are:
[xguo2@shgcc-9 38824]$ time ./gcc-42.out
real 0m1.991s
user 0m1.990s
sys 0m0.000s
[xguo2@shgcc-9 38824]$ time ./gcc-42.out
real 0m1.991s
user 0m1.991s
sys 0m0.001s
[xguo2@shgcc-9 38824]$ time ./gcc-42.out
real 0m1.991s
user 0m1.989s
sys 0m0.002s
[xguo2@shgcc-9 38824]$ time ./gcc-44.out
real 0m1.880s
user 0m1.879s
sys 0m0.001s
[xguo2@shgcc-9 38824]$ time ./gcc-44.out
real 0m1.878s
user 0m1.878s
sys 0m0.000s
[xguo2@shgcc-9 38824]$ time ./gcc-44.out
real 0m1.870s
user 0m1.869s
sys 0m0.002s
[xguo2@shgcc-9 38824]$ time ./gcc-44p.out
real 0m1.690s
user 0m1.690s
sys 0m0.000s
[xguo2@shgcc-9 38824]$ time ./gcc-44p.out
real 0m1.690s
user 0m1.689s
sys 0m0.002s
[xguo2@shgcc-9 38824]$ time ./gcc-44p.out
real 0m1.690s
user 0m1.690s
sys 0m0.000s
The only difference is:
--- 44.s 2009-02-11 15:34:57.000000000 +0800
+++ 44p.s 2009-02-11 15:34:49.000000000 +0800
@@ -102,8 +102,8 @@ _Z7bench_1PfS_fj:
.p2align 4,,10
.p2align 3
.L11:
- movaps %xmm0, %xmm1
- addps (%rsi,%rax), %xmm1
+ movaps (%rsi,%rax), %xmm1
+ addps %xmm0, %xmm1
movaps %xmm1, (%rdi,%rax)
addq $16, %rax
cmpq %rdx, %rax
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824