Bug 114865 - [13/14/15 Regression] std::atomic<X>::compare_exchange_strong seems to hang under GCC 13 for C++11
Summary: [13/14/15 Regression] std::atomic<X>::compare_exchange_strong seems to hang u...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 13.2.0
: P2 normal
Target Milestone: 13.4
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2024-04-26 15:50 UTC by Peter Dimov
Modified: 2024-10-14 07:01 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-08-23 00:00:00


Attachments
testcase (538 bytes, text/plain)
2024-04-27 00:16 UTC, Andrew Pinski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Dimov 2024-04-26 15:50:28 UTC
I'm getting weird hangs on Github Actions when using `std::atomic<state_type>::compare_exchange_strong` under GCC 13 on Ubuntu 23.04 (only; GCC 12 and earlier on Ubuntu 22.04 and earlier work). `state_type` is defined as

```
struct state_type
{
    std::uint64_t timestamp;
    std::uint16_t clock_seq;
};
```

and the code doing the CAS is

```
auto oldst = ps_->load( std::memory_order_relaxed );

for( ;; )
{
    auto newst = get_new_state( oldst );

    if( ps_->compare_exchange_strong( oldst, newst, std::memory_order_relaxed, std::memory_order_relaxed ) )
    {
        state_ = newst;
        break;
    }
}
```

where `ps` is of type `std::atomic<state_type>*`.

At a glance, I see nothing immediately wrong with the generated code (https://godbolt.org/z/8Ee3hrTz8).

However, when I change `state_type` to

```
struct state_type
{
    std::uint64_t timestamp;
    std::uint16_t clock_seq;
    std::uint16_t padding[ 3 ];
};
```
the hangs disappear. This leads me to think that the problem is caused by the original struct having padding, which isn't being handled correctly for some reason.

As we know, `std::atomic<T>::compare_exchange_strong` is carefully specified to take and return `expected` by reference, such that it can both compare the entire object as if via `memcmp` (including the padding), and return it as if by `memcpy`, again including the padding. Even though the padding bits of the initial value returned by the atomic load are unspecified, at most one iteration of the loop would be required for the padding bits to converge and for the CAS to succeed.

However, going by the symptoms alone, this doesn't seem to be the case here.

The problem may well be inside libatomic, of course; I have no way to tell.

One GHA run showing the issue is https://github.com/boostorg/uuid/actions/runs/8821753835, where only the GCC 13 job times out.
Comment 1 Peter Dimov 2024-04-26 17:47:10 UTC
> The problem may well be inside libatomic, of course; I have no way to tell.

Narrator: but he did, in fact, have a way to tell.

This is a GHA run with GCC 9 to 13 tested on both Ubuntu 23.04 and Ubuntu 23.10, and only GCC 13 hangs.

https://github.com/boostorg/uuid/actions/runs/8852038822/job/24309866206

And indeed, the codegen from GCC 12 (https://godbolt.org/z/xc7oT76fn) is radically different from (and much simpler than) the GCC 13 one (https://godbolt.org/z/eP48xv3Mr).

I think that the problem has been introduced with this commit:

https://github.com/gcc-mirror/gcc/commit/157236dbd621644b3cec50b6cf38811959f3e78c

which, ironically enough, was supposed to improve the handling of types with padding bits, but broke them entirely.

(I told the committee there was nothing wrong with compare_exchange as specified, but did they listen to me? To ask the question is to answer it.)
Comment 2 Peter Dimov 2024-04-26 19:40:03 UTC
The issue is also present for GCC 14 on Ubuntu 24.04:

https://github.com/boostorg/uuid/actions/runs/8853249656/job/24313667955
Comment 3 Andrew Pinski 2024-04-26 23:16:43 UTC
Can you provide a full runable testcase which shows the issue rather than just code snippets? 

And not just link to godbolt. You can attach it. In this case we don't need the preprocesed source since it is about the library that is having issue. Please make sure it only depends on the libstdc++/libc headers too.
Comment 4 Peter Dimov 2024-04-27 00:13:16 UTC
This

https://raw.githubusercontent.com/boostorg/uuid/feature/gcc_pr_114865/test/test_gcc_pr114865.cpp

exhibits the problem for me on GCC 13/14. I'm only seeing the hang with -std=c++11 -m32 in the CI run because this combination is tested first, but I believe it's independent of standard level and address model.
Comment 5 Andrew Pinski 2024-04-27 00:16:18 UTC
Created attachment 58053 [details]
testcase
Comment 6 Andrew Pinski 2024-04-27 00:22:26 UTC
(In reply to Peter Dimov from comment #4)
>  but I believe it's independent of standard level and address model.

No it is dependent on the standard level. C++11 fails but C++14, C++17 and C++20 all pass.
Comment 7 Andrew Pinski 2024-04-27 00:25:53 UTC
The problem is only with auto atomic variables. Looks like the padding there is not being zeroed for -std=c++11 ...
Comment 8 Peter Dimov 2024-04-27 00:29:45 UTC
(In reply to Andrew Pinski from comment #6)
> No it is dependent on the standard level. C++11 fails but C++14, C++17 and
> C++20 all pass.

That's interesting because I see basically no difference in the generated code on CE between 11 and 14:

--- "C++-x86-64 gcc 13.2-1 (1).asm"     2024-04-27 03:25:11.149385400 +0300
+++ "C++-x86-64 gcc 13.2-1.asm" 2024-04-27 03:24:59.207244400 +0300
@@ -76,10 +76,11 @@
         jmp     .L8
 main:
         push    rbx
+        mov     eax, 8738
         mov     ebx, 1024
         sub     rsp, 16
         mov     QWORD PTR [rsp], 0
-        mov     QWORD PTR [rsp+8], 8738
+        mov     WORD PTR [rsp+8], ax
 .L14:
         mov     rdi, rsp
         call    generate(std::atomic<state_type>*)

(https://godbolt.org/z/nexn5W4Ph)
Comment 9 Peter Dimov 2024-04-27 00:31:23 UTC
Oh, my mistake. C++14 does mov QWORD, and C++11 does mov WORD.
Comment 10 Andrew Pinski 2024-04-27 00:35:57 UTC
#if __cplusplus >= 201402L && __has_builtin(__builtin_clear_padding)
        if _GLIBCXX17_CONSTEXPR (__atomic_impl::__maybe_has_padding<_Tp>())
          __builtin_clear_padding(std::__addressof(_M_i));
#endif

So yes it is definitely dependent on C++ level ...
That is for C++14+ it is working correctly.
Comment 11 Peter Dimov 2024-04-27 00:38:35 UTC
So, basically, C++14 and above initialize the padding of

```
    std::atomic<state_type> state{{ 0, 0x2222 }};
```

in `main` to zero, which masks the problem in `generate`. (The problem in `generate` still exists because the assembly is identical - it just doesn't trigger because the padding is zero. If we manually poke something nonzero into the padding, it would (ought to) still break.)

Static variables work for the same reason - the padding is guaranteed zero.
Comment 12 Andrew Pinski 2024-04-27 00:42:35 UTC
(In reply to Peter Dimov from comment #11)
> So, basically, C++14 and above initialize the padding of
> 
> ```
>     std::atomic<state_type> state{{ 0, 0x2222 }};
> ```
> 
> in `main` to zero, which masks the problem in `generate`. (The problem in
> `generate` still exists because the assembly is identical - it just doesn't
> trigger because the padding is zero. If we manually poke something nonzero
> into the padding, it would (ought to) still break.)

No there is no problem in generate for C++14+ because there is no way to get the padding non-zero for that variable.

The problem is only with C++11's std::atomic ...
See PR 111077 and such.
Comment 13 Peter Dimov 2024-04-27 00:42:51 UTC
(In reply to Andrew Pinski from comment #10)
> #if __cplusplus >= 201402L && __has_builtin(__builtin_clear_padding)
>         if _GLIBCXX17_CONSTEXPR (__atomic_impl::__maybe_has_padding<_Tp>())
>           __builtin_clear_padding(std::__addressof(_M_i));
> #endif
> 
> So yes it is definitely dependent on C++ level ...
> That is for C++14+ it is working correctly.

Oh, that's the constructor of `atomic`. I thought it was the compiler initializing the padding in C++14 and above.

I wonder why `__cplusplus >= 201402L` is here.
Comment 14 Jonathan Wakely 2024-05-02 11:40:10 UTC
Because a constexpr function can't have an if statement in C++11.

But maybe we should just clear padding unconditionally for C++11.
Comment 15 Jonathan Wakely 2024-05-02 11:41:50 UTC
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -238,8 +238,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
       constexpr atomic(_Tp __i) noexcept : _M_i(__i)
       {
-#if __cplusplus >= 201402L && __has_builtin(__builtin_clear_padding)
+#if __has_builtin(__builtin_clear_padding)
+#if __cplusplus >= 201402L
        if _GLIBCXX17_CONSTEXPR (__atomic_impl::__maybe_has_padding<_Tp>())
+#endif
          __builtin_clear_padding(std::__addressof(_M_i));
 #endif
       }
Comment 16 Jonathan Wakely 2024-05-02 11:44:47 UTC
Alternatively, we could skip all the padding cope in compare_exchange for C++11, since that was a C++20 change and not a DR.
Comment 17 Jonathan Wakely 2024-05-02 11:47:14 UTC
(In reply to Jonathan Wakely from comment #15)
> --- a/libstdc++-v3/include/std/atomic
> +++ b/libstdc++-v3/include/std/atomic
> @@ -238,8 +238,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  
>        constexpr atomic(_Tp __i) noexcept : _M_i(__i)
>        {
> -#if __cplusplus >= 201402L && __has_builtin(__builtin_clear_padding)
> +#if __has_builtin(__builtin_clear_padding)
> +#if __cplusplus >= 201402L
>         if _GLIBCXX17_CONSTEXPR (__atomic_impl::__maybe_has_padding<_Tp>())
> +#endif
>           __builtin_clear_padding(std::__addressof(_M_i));
>  #endif
>        }

Oh right, that still doesn't work in C++11:

/home/jwakely/gcc/15/include/c++/15.0.0/atomic: In constructor 'std::atomic<_Tp>::atomic(_Tp)':
/home/jwakely/gcc/15/include/c++/15.0.0/atomic:247:7: error: 'constexpr' constructor does not have empty body

I remember hitting this when I implemented the padding stuff.
Comment 18 Jonathan Wakely 2024-05-02 11:55:26 UTC
So maybe:

diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h
index aa7374bb252..662cff39df5 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -956,7 +956,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       constexpr bool
       __maybe_has_padding()
       {
-#if ! __has_builtin(__builtin_clear_padding)
+       // We cannot clear padding in the constructor for C++11,
+       // so return false here to disable all code for zeroing padding.
+#if __cplusplus < 201402L || ! __has_builtin(__builtin_clear_padding)
        return false;
 #elif __has_builtin(__has_unique_object_representations)
        return !__has_unique_object_representations(_Tp)
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 36ff89a146c..336f27832fc 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -238,6 +238,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
       constexpr atomic(_Tp __i) noexcept : _M_i(__i)
       {
+       // A constexpr constructor must be empty in C++11,
+       // so we can only clear padding for C++14 and later.
 #if __cplusplus >= 201402L && __has_builtin(__builtin_clear_padding)
        if _GLIBCXX17_CONSTEXPR (__atomic_impl::__maybe_has_padding<_Tp>())
          __builtin_clear_padding(std::__addressof(_M_i));
Comment 19 Peter Dimov 2024-05-02 15:13:49 UTC
This should work.

I still don't understand why JF so insisted on all these padding shenanigans.
Comment 20 Jakub Jelinek 2024-05-21 09:20:20 UTC
GCC 13.3 is being released, retargeting bugs to GCC 13.4.