Bug 71660 - [6/7/8/9 regression] alignment of std::atomic<8 byte primitive type> (long long, double) is wrong on x86
Summary: [6/7/8/9 regression] alignment of std::atomic<8 byte primitive type> (long lo...
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 6.1.1
: P2 normal
Target Milestone: 6.5
Assignee: Not yet assigned to anyone
URL:
Keywords: ABI
Depends on: 65146
Blocks:
  Show dependency treegraph
 
Reported: 2016-06-25 18:20 UTC by Thiago Macieira
Modified: 2018-05-02 10:51 UTC (History)
8 users (show)

See Also:
Host:
Target: i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-06-27 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thiago Macieira 2016-06-25 18:20:53 UTC
The resolution of Bug 65147 made this change to __atomic_base:

-      __int_type       _M_i;
+      static constexpr int _S_alignment =
+       sizeof(_ITp) > alignof(_ITp) ? sizeof(_ITp) : alignof(_ITp);
+
+      alignas(_S_alignment) __int_type _M_i;

That breaks the alignment for 8-byte primitive types on 32-bit x86, which have a historical under-alignment to keep ABI compatibility with existing code.

That is:
  alignof(long long) == 8
but:
  struct InStruct { long long x; }
  alignof(InStruct) == 4

The above is done so that existing structures containing those types retain their layout.

Prior to Commit 221945, this struct had alignment of 4 on x86:

  struct AtomicStruct { int; std::atomic<long long> x; };

After it, it's now 8. 

More importantly, the sizeof that struct changed from 12 to 16 and the layout also changed.

Please update this change to exclude:
 long long
 unsigned long long
 double
 long double
Comment 1 Jakub Jelinek 2016-06-27 08:11:09 UTC
Foir double-word compare and exchange you need double-word alignment, so I think the current alignment is correct.
Comment 2 Jonathan Wakely 2016-06-27 13:06:44 UTC
However alignof(std::atomic<long long>) != alignof(_Atomic long long) for x86-32, which is wrong, since that was the point of PR65147.

#ifdef __cplusplus
#include <atomic>
using std::atomic_llong;
#else
#include <stdalign.h>
#include <stdatomic.h>
#endif

#include <stdio.h>

struct InStruct { long long x; };
struct AtomicStruct { int i; atomic_llong x; };

int main()
{
  printf("%zu %zu %zu %zu\n",
      alignof(struct InStruct),
      alignof(atomic_llong),
      sizeof(struct AtomicStruct),
      alignof(struct AtomicStruct));
}

With GCC this prints 4 4 12 4 but with G++ it prints 4 8 16 8

We might need a front-end attribute so std::atomic can emulate what the C front-end does with _Atomic.
Comment 3 Thiago Macieira 2016-06-27 14:40:52 UTC
(In reply to Jakub Jelinek from comment #1)
> Foir double-word compare and exchange you need double-word alignment, so I
> think the current alignment is correct.

The instruction manual says that CMPXCHG16B requires 128-bit alignment, but doesn't say the same for CMPXCHG8B. It says that the AC(0) alignment check fault could happen if it's not aligned, but doesn't say what the required alignment is.
Comment 4 Jonathan Wakely 2017-04-04 10:05:33 UTC
(In reply to Jonathan Wakely from comment #2)
> We might need a front-end attribute so std::atomic can emulate what the C
> front-end does with _Atomic.

I still think this is the only reliable way to solve this. Trying to emulate target hooks in C++ template metaprogrammes is doomed to fail.
Comment 5 Peter Cordes 2017-05-20 19:32:28 UTC
(In reply to Thiago Macieira from comment #3)
> (In reply to Jakub Jelinek from comment #1)
> > Foir double-word compare and exchange you need double-word alignment, so I
> > think the current alignment is correct.
> 
> The instruction manual says that CMPXCHG16B requires 128-bit alignment, but
> doesn't say the same for CMPXCHG8B. It says that the AC(0) alignment check
> fault could happen if it's not aligned, but doesn't say what the required
> alignment is.

The more important point is that simple loads and stores are not atomic on cache-line splits, so requiring natural alignment for atomic objects would avoid that.  LOCKed read-modify-write ops are also *much* slower on cache-line splits.


#AC isn't really relevant, but I'd assume it requires 8B alignment since it's really a single 8B atomic RMW.

#AC faults only happen if the kernel sets the AC bit in EFLAGS, which will cause *any* unaligned access to fault.  Code all over the place assumes that unaligned accesses are safe.  e.g. glibc memcpy commonly uses unaligned loads for small non-power-of-2 sizes or unaligned inputs.  So you can't really enable the AC flag with normal code.

I assume this is why Intel was lazy about documenting the exact details of #AC behaviour for this instruction, or figured it was obvious.
Comment 6 Christian Gagneraud 2017-09-02 06:56:41 UTC
Hi there,

What is the status of this bug report? Will this be fixed one day? Isn't this a serious issue?

Qt cannot currently be fully built because of this. See
http://lists.qt-project.org/pipermail/interest/2017-September/027953.html
http://lists.qt-project.org/pipermail/interest/2017-September/027955.html

Honestly this all look scary to me, either gcc behaviour or Qt beahaviour or both.

Full thread on Qt ML:
http://lists.qt-project.org/pipermail/interest/2017-September/027948.html
Comment 7 Peter Cordes 2017-09-05 13:29:41 UTC
C++11 std::atomic<> is correct, and the change was necessary.

8B alignment is required for 8B objects to be efficiently lock-free (using SSE load / store for .load() and .store(), see https://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-variable-atomic), and to avoid a factor of ~100 slowdown if lock cmpxchg8b is split across a cache-line boundary.

What needs to change is the C11 stdatomic default alignment for 64-bit objects in and out of structs.  (This includes _Atomic struct {int a,b;};)

Currently, atomic_llong **is not atomic** in gcc, only in g++.  I attached a testcase showing tearing to the still-unfixed C11 bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65146#c4).  (It was filed at the same time as the C++11 bug that led to the change in std::atomic.)


re: long double: it can't be lock-free in -m32.  10-byte x87 load / store instructions are not guaranteed to be atomic, and in fact even on real Intel CPUs are done as two separate load or store uops.

alignof(long double) in 32-bit is different from alignof(long double) in 64-bit.  std::atomic<long double> or _Atomic long double should always have the same alignment as long double.
Comment 8 Thiago Macieira 2017-09-05 14:24:14 UTC
(In reply to Peter Cordes from comment #7)
> 8B alignment is required for 8B objects to be efficiently lock-free (using
> SSE load / store for .load() and .store(), see
> https://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-
> naturally-aligned-variable-atomic), and to avoid a factor of ~100 slowdown
> if lock cmpxchg8b is split across a cache-line boundary.

Unfortunately, the issue is not efficiency, but compatibility. The change broke ABI for roughly 50% of structs containing atomic<64bit>. I understand being fast, but not at the expense of silently breaking code at runtime.

> alignof(long double) in 32-bit is different from alignof(long double) in
> 64-bit.  std::atomic<long double> or _Atomic long double should always have
> the same alignment as long double.

In and out of structs? That's the whole problem: inside structs, the alignment is 4 for historical reasons.
Comment 9 H.J. Lu 2017-09-05 15:43:14 UTC
This is related to PR 65146.
Comment 10 Thiago Macieira 2017-09-05 17:06:32 UTC
Actually, PR 65146 points out that the problem is not efficiency but correctness. An under-aligned type could cross a cacheline boundary and thus fail to be atomic in the first place.

Therefore, it is correct to increase the alignment, even if that causes an ABI change for existing structures. Those structures were disasters waiting to happen.

I withdraw my bug report. Close it as INVALID or NOTABUG or whatever is appropriate.
Comment 11 Peter Cordes 2017-09-05 21:32:03 UTC
(In reply to Thiago Macieira from comment #10)
> Actually, PR 65146 points out that the problem is not efficiency but
> correctness. An under-aligned type could cross a cacheline boundary and thus
> fail to be atomic in the first place.

As I pointed out there, you technically could solve the correctness problem by checking alignment and falling back to locking for objects where a plain 8B load or 8B store wouldn't be atomic.  That's what I meant by "efficiently lock-free".  And we're talking about *huge* inefficiencies here compared to always being able to inline and SSE load/store.

That would let you keep struct layouts the same, but it would still be an ABI change, since everything has to agree about which objects are lock-free and which aren't.  Now that I think about it, all of my suggested fixes on PR 65146 are effectively ABI changes.

> Those structures were disasters waiting to happen.

Yes, exactly.

Basically any existing binaries compiled with a gcc that allows under-aligned atomic objects are unsafe, so keeping compatibility with them is not important.
Comment 12 Thiago Macieira 2017-09-06 02:14:22 UTC
Another problem is that we've now had a couple of years with this issue, so it's probably worse to make a change again.
Comment 13 Peter Cordes 2017-09-09 03:25:08 UTC
(In reply to Thiago Macieira from comment #12)
> Another problem is that we've now had a couple of years with this issue, so
> it's probably worse to make a change again.

A change to C++11 std::atomic?  Yeah, we definitely should not change it.  It's using by far the most efficient solution for code that does any pure-store or pure-load operations, and any change would break that.

It's also compatible with clang (at least as far as struct layout and how whether or not 64-bit atomic objects are lock-free).

Also, it's interoperable with 64-bit std::atomic for shared-memory segments, if you avoid struct layout differences for non-atomic 64-bit struct members.

If you want a struct with non-atomic members to match the layout of a struct with atomic members, do something like

struct foo {
    char c;
    alignas(atomic<long long>) long long t;
};

And usually sort your struct members largest-first to reduce padding in the first place.

Maybe there's some other case that makes it convenient for minimum alignment (C11 _Alignof) atomic types to match _Alignof non-atomic types?  Note that gcc's implementation of C++11/14 alignof doesn't match it's C11 _Alignof behaviour.  g++ alignof gives you the "recommended" alignment, like __alignof__, rather than the minimum where you will ever find a value of this type.  See https://gcc.gnu.org/ml/gcc-patches/2013-12/msg00435.html and bug 52023 for the reasons behind changing C11 _Alignof but not C++14 alignof.  (On reading the standard, I'm not sure I agree;  I think C++14 should be returning 4 for alignof(long long) with -m32 in the SysV ABI.)

IDK what Qt's assert is guarding against.  If you're specifically worried about atomicity, checking that alignof(InStruct) == sizeof(long long) makes more sense, because that's required on almost any architecture as a guaranteed way to avoid cache-line splits.  (C/C++ don't have a simple way to express "unaligned is fine except at cache line boundaries" like you get on Intel specifically (not AMD)).
Comment 14 Thiago Macieira 2017-09-09 13:47:39 UTC
(In reply to Peter Cordes from comment #13)
> If you want a struct with non-atomic members to match the layout of a struct
> with atomic members, do something like
> 
> struct foo {
>     char c;
>     alignas(atomic<long long>) long long t;
> };
> 
[cut]
> IDK what Qt's assert is guarding against.  If you're specifically worried
> about atomicity, checking that alignof(InStruct) == sizeof(long long) makes
> more sense, because that's required on almost any architecture as a
> guaranteed way to avoid cache-line splits.  (C/C++ don't have a simple way
> to express "unaligned is fine except at cache line boundaries" like you get
> on Intel specifically (not AMD)).

It was trying to guard against exactly what you said above: that the alignment of a QAtomicInteger<T> was exactly the same as the alignment of a plain T inside a struct, so one could replace a previous plain member with an atomic and keep binary compatibility. 

But it's clear now that atomic types may need extra alignment than the plain types. In hindsight, the check is unnecessary and should be removed; people should not expect to replace T with std::atomic<T> or QAtomicInteger<T> and keep ABI.
Comment 15 Jakub Jelinek 2017-10-10 13:27:19 UTC
GCC 5 branch is being closed
Comment 16 Jonathan Wakely 2018-03-12 18:23:54 UTC
(In reply to Thiago Macieira from comment #14)
> It was trying to guard against exactly what you said above: that the
> alignment of a QAtomicInteger<T> was exactly the same as the alignment of a
> plain T inside a struct, so one could replace a previous plain member with
> an atomic and keep binary compatibility. 

That is not expected to work.

> But it's clear now that atomic types may need extra alignment than the plain
> types. In hindsight, the check is unnecessary and should be removed; people
> should not expect to replace T with std::atomic<T> or QAtomicInteger<T> and
> keep ABI.

Agreed.

But what we do care about is comment 2, i.e. _Atomic(T) and std::atomic<T> should have the same alignment (both in an out of structs). Maybe that needs the C front-end to change how _Atomic works, or maybe it needs the C++ library to change how std::atomic works, but I want to keep this bug open while comment 2 gives different answers for C and C++.
Comment 17 Peter Cordes 2018-03-13 10:58:57 UTC
(In reply to Jonathan Wakely from comment #16)
> But what we do care about is comment 2, i.e. _Atomic(T) and std::atomic<T>
> should have the same alignment (both in an out of structs). Maybe that needs
> the C front-end to change how _Atomic works, or maybe it needs the C++
> library to change how std::atomic works, but I want to keep this bug open
> while comment 2 gives different answers for C and C++.

Right, gcc's C _Atomic ABI is still broken for long long on 32-bit x86.  It only aligned _Atomic long long to 32 bits (inside structs), but then assumes that 8-byte loads / stores (with x87 or SSE1/2) are atomic.

It also leads to abysmal performance for  LOCK CMPXCHG  or other RMW operations if the atomic object is split across a cache line.

That's bug 65146, so we can close this one.  (I never got around to posting in the google group for the ABI.  By far the best good solution is giving _Atomic long long (and other 8-byte objects) a boost to their _Alignof, up to 8 byte alignment even inside structs.)
Comment 18 Jason Merrill 2018-04-24 19:58:39 UTC
(In reply to Peter Cordes from comment #13)
> Note that
> gcc's implementation of C++11/14 alignof doesn't match it's C11 _Alignof
> behaviour.  g++ alignof gives you the "recommended" alignment, like
> __alignof__, rather than the minimum where you will ever find a value of
> this type.  See https://gcc.gnu.org/ml/gcc-patches/2013-12/msg00435.html and
> bug 52023 for the reasons behind changing C11 _Alignof but not C++14
> alignof.  (On reading the standard, I'm not sure I agree;  I think C++14
> should be returning 4 for alignof(long long) with -m32 in the SysV ABI.)

This has now been fixed, BTW (bug 69560).
Comment 19 Thiago Macieira 2018-04-24 20:41:40 UTC
And Qt has stopped complaining about this. https://codereview.qt-project.org/227296
Comment 20 Jonathan Wakely 2018-05-02 10:51:30 UTC
As discussed above, I'm closing this as not-a-bug. The change to std::atomic alignment was intentional and is necessary. It would be nice to be consistent with _Atomic but we seem to be in agreement that the C front end should change, not std::atomic.