This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

gcc 25% slower than clang 3.5 for adding complex numbers


gcc 4.9.2 has worse performance than clang 3.5 when dealing with
complex numbers.

See bug 64410:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

For adding two arrays with complex numbers, clang's vectoriser is
better able to exploit the layout of complex numbers.

Inner loop produced by gcc:
.L52:
  movsd (%r15,%rax), %xmm1
  movsd 8(%r15,%rax), %xmm0
  addsd 0(%rbp,%rax), %xmm1
  addsd 8(%rbp,%rax), %xmm0
  movsd %xmm1, (%rbx,%rax)
  movsd %xmm0, 8(%rbx,%rax)
  addq $16, %rax
  cmpq %rsi, %rax
  jne .L52

Inner loop produced by clang:
.LBB0_145:
  movupd -16(%rbx), %xmm0
  movupd -16(%rax), %xmm1
  addpd %xmm0, %xmm1
  movupd %xmm1, -16(%rdi)
  movupd (%rbx), %xmm0
  movupd (%rax), %xmm1
  addpd %xmm0, %xmm1
  movupd %xmm1, (%rdi)
  addq $2, %rbp
  addq $32, %rbx
  addq $32, %rax
  addq $32, %rdi
  addl $-2, %ecx
  jne .LBB0_145


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]