[Bug target/71153] New: aarch64 __atomic_fetch_and() generates probably incorrect double inversion
dhowells at redhat dot com
gcc-bugzilla@gcc.gnu.org
Mon May 16 21:26:00 GMT 2016
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71153
Bug ID: 71153
Summary: aarch64 __atomic_fetch_and() generates probably
incorrect double inversion
Product: gcc
Version: 6.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: dhowells at redhat dot com
Target Milestone: ---
Compiling this code:
static __always_inline
void clear_bit_unlock(long bit, volatile unsigned long *addr)
{
unsigned long mask = 1UL << (bit & (64 - 1));
addr += bit >> 6;
__atomic_fetch_and(addr, ~mask, __ATOMIC_RELEASE);
}
void bar_clear_bit_unlock(unsigned long *p)
{
clear_bit_unlock(22, p);
}
for aarch64-linux-gnu with "-march=armv8-a+lse -Os" generates a double negation
of the mask value in the assembly:
000000000000007c <bar_clear_bit_unlock>:
7c: 92a00801 mov x1, #0xffffffffffbfffff // #-4194305
80: aa2103e1 mvn x1, x1
84: f8611001 ldclrl x1, x1, [x0]
88: d65f03c0 ret
The instruction at 7c is loading an inverted value into x1 (it's actually a
MOVN instruction according to the opcode table that I can find); the value in
x1 is then inverted *again* by the MVN instruction.
Now, I can't find a description of how the LDCLRL instruction works, so I can't
say that it doesn't invert the parameter a third time (ie. apply an A AND-NOT B
operation), but it looks suspicious. If nothing else, the MOVN and MOV could
be condensed into just a MOV.
If a parameter is used instead of a constant:
void foo_clear_bit_unlock(long bit, unsigned long *p)
{
clear_bit_unlock(bit, p);
}
then two MVN instructions are generated:
0000000000000048 <foo_clear_bit_unlock>:
48: 12001403 and w3, w0, #0x3f
4c: 9346fc02 asr x2, x0, #6
50: d2800020 mov x0, #0x1 // #1
54: 8b020c21 add x1, x1, x2, lsl #3
58: 9ac32000 lsl x0, x0, x3
5c: aa2003e0 mvn x0, x0
60: aa2003e2 mvn x2, x0
64: f8621022 ldclrl x2, x2, [x1]
68: d65f03c0 ret
The C code appears to be correct, because on x86_64 it generates:
000000000000004c <bar_clear_bit_unlock>:
4c: f0 48 81 27 ff ff bf lock andq $0xffffffffffbfffff,(%rdi)
53: ff
54: c3 retq
More information about the Gcc-bugs
mailing list