This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/47949] Missed optimization for -Os using xchg instead of mov.

From: "svfuerst at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Wed, 2 Mar 2011 21:51:17 +0000
Subject: [Bug target/47949] Missed optimization for -Os using xchg instead of mov.
Auto-submitted: auto-generated
References: <bug-47949-4@http.gcc.gnu.org/bugzilla/>

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47949

--- Comment #3 from Steven Fuerst <svfuerst at gmail dot com> 2011-03-02 21:51:12 UTC ---
Having a quick look at generated code... it appears that this pattern doesn't
come up all that often.  However, there is one case where it does: the epilogue
of a function. i.e. gcc tends to generate code looking like:

movl    %ebp, %eax
movq    8(%rsp), %rbx
movq    16(%rsp), %rbp
movq    24(%rsp), %r12
movq    32(%rsp), %r13
addq    $40, %rsp
ret

Replacing the move to %eax with an exchange with %ebp is a win in this
particular case.  The extra cycle or two of latency that xchg takes doesn't
matter as the other moves and ret instruction overlap in execution with it. 
Benchmarking on an opteron in 64bit mode confirms this hypothesis even in the
degenerate case where no other moves exist:

foo1:
    mov %edi, %eax
    retq

foo2:
    xchg %eax, %edi
    retq

foo1 and foo2 take the same time to execute.

References:
- [Bug target/47949] New: Missed optimization for -Os using xchg instead of mov.
  - From: svfuerst at gmail dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]