[Bug tree-optimization/54000] [4.8/4.9 Regression] Performance breakdown for gcc-4.{6,7} vs. gcc-4.5 using std::vector in matrix vector multiplication (IVopts / inliner)
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Mon Feb 9 08:46:00 GMT 2015
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54000
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*, i?86-*-*
Known to work| |5.0
Summary|[4.8/4.9/5 Regression] |[4.8/4.9 Regression]
|Performance breakdown for |Performance breakdown for
|gcc-4.{6,7} vs. gcc-4.5 |gcc-4.{6,7} vs. gcc-4.5
|using std::vector in matrix |using std::vector in matrix
|vector multiplication |vector multiplication
|(IVopts / inliner) |(IVopts / inliner)
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, with trunk (gcc 5) I see
.L13:
movsd (%rdx), %xmm1
xorl %eax, %eax
.L12:
movsd -8(%rcx,%rax), %xmm0
mulsd (%rsi,%rax), %xmm0
addq $8, %rax
cmpq $24, %rax
addsd %xmm0, %xmm1
movsd %xmm1, (%rdx)
jne .L12
addq $8, %rdx
addq $8, %rcx
addq $24, %rsi
cmpq %rdi, %rdx
jne .L13
thus maybe even better than 4.5.
GCC 4.9 produces
.L17:
leaq (%r8,%rdx), %rcx
movsd 8(%rdi,%rdx), %xmm1
xorl %eax, %eax
addq %r9, %rcx
.L14:
movsd -8(%rcx,%rax), %xmm0
mulsd (%rsi,%rax), %xmm0
addq $8, %rax
cmpq $24, %rax
addsd %xmm0, %xmm1
movsd %xmm1, 8(%rdi,%rdx)
jne .L14
addq $8, %rdx
addq $24, %rsi
cmpq $1016, %rdx
jne .L17
it might be again inliner changes that trigger the better behavior of course.
So - fixed in GCC 5. Not sure how to produce a testcase that reliably
tracks good behavior here. IVOPTs dumping should be improved somewhat.
More information about the Gcc-bugs
mailing list