[Bug target/71153] New: aarch64 __atomic_fetch_and() generates probably incorrect double inversion

Mon May 16 21:26:00 GMT 2016

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71153

            Bug ID: 71153
           Summary: aarch64 __atomic_fetch_and() generates probably
                    incorrect double inversion
           Product: gcc
           Version: 6.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dhowells at redhat dot com
  Target Milestone: ---

Compiling this code:

static __always_inline
void clear_bit_unlock(long bit, volatile unsigned long *addr)
{
        unsigned long mask = 1UL << (bit & (64 - 1));
        addr += bit >> 6;
        __atomic_fetch_and(addr, ~mask, __ATOMIC_RELEASE);
}

void bar_clear_bit_unlock(unsigned long *p)
{
        clear_bit_unlock(22, p);
}

for aarch64-linux-gnu with "-march=armv8-a+lse -Os" generates a double negation
of the mask value in the assembly:

000000000000007c <bar_clear_bit_unlock>:
  7c:   92a00801        mov     x1, #0xffffffffffbfffff         // #-4194305
  80:   aa2103e1        mvn     x1, x1
  84:   f8611001        ldclrl  x1, x1, [x0]
  88:   d65f03c0        ret

The instruction at 7c is loading an inverted value into x1 (it's actually a
MOVN instruction according to the opcode table that I can find); the value in
x1 is then inverted *again* by the MVN instruction.

Now, I can't find a description of how the LDCLRL instruction works, so I can't
say that it doesn't invert the parameter a third time (ie. apply an A AND-NOT B
operation), but it looks suspicious.  If nothing else, the MOVN and MOV could
be condensed into just a MOV.

If a parameter is used instead of a constant:

void foo_clear_bit_unlock(long bit, unsigned long *p)
{
        clear_bit_unlock(bit, p);
}

then two MVN instructions are generated:

0000000000000048 <foo_clear_bit_unlock>:
  48:   12001403        and     w3, w0, #0x3f
  4c:   9346fc02        asr     x2, x0, #6
  50:   d2800020        mov     x0, #0x1                        // #1
  54:   8b020c21        add     x1, x1, x2, lsl #3
  58:   9ac32000        lsl     x0, x0, x3
  5c:   aa2003e0        mvn     x0, x0
  60:   aa2003e2        mvn     x2, x0
  64:   f8621022        ldclrl  x2, x2, [x1]
  68:   d65f03c0        ret

The C code appears to be correct, because on x86_64 it generates:

000000000000004c <bar_clear_bit_unlock>:
  4c:   f0 48 81 27 ff ff bf    lock andq $0xffffffffffbfffff,(%rdi)
  53:   ff 
  54:   c3                      retq