[Bug rtl-optimization/48986] New: Missed optimization in atomic decrement on x86/x64
piotr.wyderski at gmail dot com
gcc-bugzilla@gcc.gnu.org
Fri May 13 10:32:00 GMT 2011
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48986
Summary: Missed optimization in atomic decrement on x86/x64
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: piotr.wyderski@gmail.com
Many uses of __sync_fetch_and_add() boil down to
decrement operation and checking if the result is
zero in order to delete the pointee. The most natural
way is to define it as:
bool xxx_decrement(int* p) {
return __sync_fetch_and_add(p, -1) == 1;
}
void yyy(int* p) {
if (xxx_decrement(p)) {
delete p;
}
}
Unfortunately, GCC compiles it in a literal way:
<__Z3yyyPi>:
40edd0: 83 ec 0c sub $0xc,%esp
40edd3: ba ff ff ff ff mov $0xffffffff,%edx
40edd8: 8b 44 24 10 mov 0x10(%esp),%eax
40eddc: f0 0f c1 10 lock xadd %edx,(%eax)
40ede0: 83 fa 01 cmp $0x1,%edx
40ede3: 74 0b je 40edf0 <__Z3yyyPi+0x20>
40ede5: 83 c4 0c add $0xc,%esp
40ede8: c3 ret
40ede9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
40edf0: 89 44 24 10 mov %eax,0x10(%esp)
40edf4: 83 c4 0c add $0xc,%esp
40edf7: e9 24 03 00 00 jmp 40f120 <___wrap__ZdlPv>
40edfc: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
with the gist being:
40edd3: ba ff ff ff ff mov $0xffffffff,%edx
40eddc: f0 0f c1 10 lock xadd %edx,(%eax)
40ede0: 83 fa 01 cmp $0x1,%edx
40ede3: 74 0b je 40edf0 <__Z3yyyPi+0x20>
This special case should be handled by the optimizer and produce:
lock sub $0x01,(%eax)
je ...
or:
lock dec (%eax)
je ...
on platforms which do not suffer carry chain dependency penalties,
e.g. some AMD's chips.
Please note that this generalizes for any N:
return __sync_fetch_and_add(p, -N) == N;
with a remark that for N != 1 the dec replacement can't be used.
More information about the Gcc-bugs
mailing list