This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
- From: "ubizjak at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 25 Jun 2009 08:25:15 -0000
- Subject: [Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
- References: <bug-34163-13404@http.gcc.gnu.org/bugzilla/>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Comment #14 from ubizjak at gmail dot com 2009-06-25 08:25 -------
(In reply to comment #13)
> Predictive commoning does exactly what you want.
It is not effective for the testcase in Comment #9. The dumps for innermost
loop are the same for -O2 -funroll-loops [-fpredictive-commoning]:
.L6:
movss (%rsi), %xmm9
addl $4, %r8d
mulss (%rcx), %xmm9
movss (%rdx), %xmm8
movss 4(%rdx), %xmm6
movss 8(%rdx), %xmm4
movss 12(%rdx), %xmm2
subss %xmm9, %xmm8
mulss 0(%rbp), %xmm8
movss %xmm8, (%rdx)
movss 4(%rsi), %xmm7
mulss 4(%rcx), %xmm7
subss %xmm7, %xmm6
mulss 4(%rbp), %xmm6
movss %xmm6, 4(%rdx)
movss 8(%rsi), %xmm5
mulss 8(%rcx), %xmm5
subss %xmm5, %xmm4
mulss 8(%rbp), %xmm4
movss %xmm4, 8(%rdx)
movss 12(%rsi), %xmm3
addq $16, %rsi
mulss 12(%rcx), %xmm3
addq $16, %rcx
subss %xmm3, %xmm2
mulss 12(%rbp), %xmm2
addq $16, %rbp
movss %xmm2, 12(%rdx)
addq $16, %rdx
cmpl %r9d, %r8d
jne .L6
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163