[PATCH 2/2] Support __ATOMIC_HLE_RELEASE for __atomic_clear/store_n

Andi Kleen andi@firstfloor.org
Mon Jan 14 19:02:00 GMT 2013

On Mon, Jan 14, 2013 at 07:40:56PM +0100, Uros Bizjak wrote:
> On Mon, Jan 14, 2013 at 7:06 PM, Andi Kleen <andi@firstfloor.org> wrote:
> >> This cannot happen, we reject code that sets both __HLE* flags.
> >
> > BTW I found more HLE bugs, it looks like some of the fetch_op_*
> > patterns do not match always and fall back to cmpxchg, which
> > does not generate HLE code correctly. Not fully sure what's
> > wrong, can you spot any obvious problems? You changed the
> >
> > (define_insn "atomic_<logic><mode>"
> >
> > pattern last.
> I don't think this is a target problem, these insns work as expected
> and are covered by extensive testsuite in gcc.target/i386/hle-*.c.

Well the C++ test cases I wrote didn't work. It may be related to 
how complex the program is. Simple calls as in the original
test suite seem to work.

e.g.  instead of xacquire lock and ... it ended up with a cmpxchg loop
(which I think is a fallback path). The cmpxchg loop didn't include
a HLE prefix (and simply adding one is not enoigh, would need more
changes for successfull elision)

Before HLE the cmpxchg code was correct, just somewhat inefficient.
Even with HLE it is technically correct, just it'll never elide.

I think I would like to fix and,or,xor and disallow HLE for nand.

Here's a test case. Needs the libstdc++ HLE patch posted.

#include <atomic>

#define ACQ memory_order_acquire | __memory_order_hle_acquire
#define REL memory_order_release | __memory_order_hle_release

int main()
  using namespace std;
  atomic_ulong au = ATOMIC_VAR_INIT(0);

  if (!au.fetch_and(1, ACQ))
    au.fetch_and(-1, REL);

  unsigned lock = 0;
  __atomic_fetch_and(&lock, 1, __ATOMIC_HLE_ACQUIRE|__ATOMIC_ACQUIRE);

  return 0;

The first fetch_and generates: (wrong)

        movq    %rax, %rcx
        movq    %rax, %rdx
        andl    $1, %ecx
        lock; cmpxchgq  %rcx, -24(%rsp)
        jne     .L2

the second __atomic_fetch_and generates (correct):

        .byte   0xf2
        andl    $1, -28(%rsp)


ak@linux.intel.com -- Speaking for myself only.

More information about the Gcc-patches mailing list