This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
gcc 25% slower than clang 3.5 for adding complex numbers
- From: Conrad S <conradsand dot arma at gmail dot com>
- To: "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Date: Fri, 26 Dec 2014 15:32:26 +1000
- Subject: gcc 25% slower than clang 3.5 for adding complex numbers
- Authentication-results: sourceware.org; auth=none
gcc 4.9.2 has worse performance than clang 3.5 when dealing with
complex numbers.
See bug 64410:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410
For adding two arrays with complex numbers, clang's vectoriser is
better able to exploit the layout of complex numbers.
Inner loop produced by gcc:
.L52:
movsd (%r15,%rax), %xmm1
movsd 8(%r15,%rax), %xmm0
addsd 0(%rbp,%rax), %xmm1
addsd 8(%rbp,%rax), %xmm0
movsd %xmm1, (%rbx,%rax)
movsd %xmm0, 8(%rbx,%rax)
addq $16, %rax
cmpq %rsi, %rax
jne .L52
Inner loop produced by clang:
.LBB0_145:
movupd -16(%rbx), %xmm0
movupd -16(%rax), %xmm1
addpd %xmm0, %xmm1
movupd %xmm1, -16(%rdi)
movupd (%rbx), %xmm0
movupd (%rax), %xmm1
addpd %xmm0, %xmm1
movupd %xmm1, (%rdi)
addq $2, %rbp
addq $32, %rbx
addq $32, %rax
addq $32, %rdi
addl $-2, %ecx
jne .LBB0_145