This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug target/47949] Missed optimization for -Os using xchg instead of mov.
- From: "svfuerst at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 2 Mar 2011 21:51:17 +0000
- Subject: [Bug target/47949] Missed optimization for -Os using xchg instead of mov.
- Auto-submitted: auto-generated
- References: <bug-47949-4@http.gcc.gnu.org/bugzilla/>
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949
--- Comment #3 from Steven Fuerst <svfuerst at gmail dot com> 2011-03-02 21:51:12 UTC ---
Having a quick look at generated code... it appears that this pattern doesn't
come up all that often. However, there is one case where it does: the epilogue
of a function. i.e. gcc tends to generate code looking like:
movl %ebp, %eax
movq 8(%rsp), %rbx
movq 16(%rsp), %rbp
movq 24(%rsp), %r12
movq 32(%rsp), %r13
addq $40, %rsp
ret
Replacing the move to %eax with an exchange with %ebp is a win in this
particular case. The extra cycle or two of latency that xchg takes doesn't
matter as the other moves and ret instruction overlap in execution with it.
Benchmarking on an opteron in 64bit mode confirms this hypothesis even in the
degenerate case where no other moves exist:
foo1:
mov %edi, %eax
retq
foo2:
xchg %eax, %edi
retq
foo1 and foo2 take the same time to execute.