Re: g++ 4.2.x x86: code generation for __sync_lock_test_and_set() - builtin

Daniel Lohmann wrote:
g++ 4.2.3


I have the following code, which uses the new __sync_lock_test_and_set() builtin:

class Mutex {
  int locked;
  Mutex() {
    locked = 0;
  void lock();
  void unlock();

void Mutex::lock() {
  while( __sync_lock_test_and_set( &locked, 1) == 0 )
void Mutex::unlock() {
  __sync_lock_release( &locked );

After compiling with -03 -fomit-frame-pointer, the resulting code for the Mutex::lock() method looks as follows:

00000010 <Mutex::lock()>:
  10:    8b 54 24 04              mov    0x4(%esp),%edx
  14:    b8 01 00 00 00           mov    $0x1,%eax
  19:    87 02                    xchg   %eax,(%edx)
  1b:    85 c0                    test   %eax,%eax
  1d:    74 f5                    je     14 <Mutex::lock()+0x4>
  1f:    f3 c3                    repz ret

I am wondering about the repz prefix before the ret. A "do RET until Z-Flag is set" obviously does not make sense from the functional point of view. So I assume that it actually is a side effects of the repz prefix that is exploited here to guarantee "something" with respect to instruction reordering, fetching, caching, or ...?

So what exactly is this "something"?
And what exactly could happen under which circumstances if we don't use it?

Google does not reveal much. If one googles for "repz ret" one gets a *load* of hits -- but just because of the fact that "ret" appears immediately after "repz" in the alphabetically sorted list of x86 instructions :-)

If you grep the gcc source you'll find

;; Used by x86_machine_dependent_reorg to avoid penalty on single byte RET
;; instruction Athlon and K8 have.

(define_insn "return_internal_long"
  (unspec [(const_int 0)] UNSPEC_REP)]
 [(set_attr "length" "1")
  (set_attr "length_immediate" "0")
  (set_attr "prefix_rep" "1")
  (set_attr "modrm" "0")])

