[Bug target/98737] New: Atomic operation on x86 no optimized to use flags

Mon Jan 18 19:59:19 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98737

            Bug ID: 98737
           Summary: Atomic operation on x86 no optimized to use flags
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: drepper.fsp+rhbz at gmail dot com
  Target Milestone: ---

Consider the following code:

long a;

_Bool f(long b)
{
  return __atomic_sub_fetch(&a, b, __ATOMIC_RELEASE) == 0;
}

_Bool g(long b)
{
  return (a -= b) == 0;
}

When compiling for x86-64 with the current HEAD as of 20210118 the resulting
code is:

0000000000000000 <f>:
   0:   48 f7 df                neg    %rdi
   3:   48 89 f8                mov    %rdi,%rax
   6:   f0 48 0f c1 05 00 00    lock xadd %rax,0x0(%rip)        # f <f+0xf>
   d:   00 00 
   f:   48 01 f8                add    %rdi,%rax
  12:   0f 94 c0                sete   %al
  15:   c3                      retq   
  16:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  1d:   00 00 00 

0000000000000020 <g>:
  20:   48 29 3d 00 00 00 00    sub    %rdi,0x0(%rip)        # 27 <g+0x7>
  27:   0f 94 c0                sete   %al
  2a:   c3                      retq   

The code for f is far too complicated.  All that needs to be different from the
code in g is that the lock prefix must be used for sub.

Probably all __atomic_* builtins have problems with using flags when possible.

This is not an esoteric problem.  I was specifically looking at optimizing the
std::latch implementation for C++20 and this is what would be needed.  Without
a fix a special version would be needed or the current, much worse code is
used.