This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug libstdc++/71660] [5/6/7/8 regression] alignment of std::atomic<8 byte primitive type> (long long, double) is wrong on x86
- From: "peter at cordes dot ca" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Sat, 20 May 2017 19:32:28 +0000
- Subject: [Bug libstdc++/71660] [5/6/7/8 regression] alignment of std::atomic<8 byte primitive type> (long long, double) is wrong on x86
- Auto-submitted: auto-generated
- References: <bug-71660-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71660
Peter Cordes <peter at cordes dot ca> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |peter at cordes dot ca
--- Comment #5 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Thiago Macieira from comment #3)
> (In reply to Jakub Jelinek from comment #1)
> > Foir double-word compare and exchange you need double-word alignment, so I
> > think the current alignment is correct.
>
> The instruction manual says that CMPXCHG16B requires 128-bit alignment, but
> doesn't say the same for CMPXCHG8B. It says that the AC(0) alignment check
> fault could happen if it's not aligned, but doesn't say what the required
> alignment is.
The more important point is that simple loads and stores are not atomic on
cache-line splits, so requiring natural alignment for atomic objects would
avoid that. LOCKed read-modify-write ops are also *much* slower on cache-line
splits.
#AC isn't really relevant, but I'd assume it requires 8B alignment since it's
really a single 8B atomic RMW.
#AC faults only happen if the kernel sets the AC bit in EFLAGS, which will
cause *any* unaligned access to fault. Code all over the place assumes that
unaligned accesses are safe. e.g. glibc memcpy commonly uses unaligned loads
for small non-power-of-2 sizes or unaligned inputs. So you can't really enable
the AC flag with normal code.
I assume this is why Intel was lazy about documenting the exact details of #AC
behaviour for this instruction, or figured it was obvious.