Although this paper was moved by Core at the meeting, it's a change to the library atomics clause. Do you need compiler support for this? It seems fairly straightforward to handle types for which has_unique_object_representations is false by zeroing the storage as the first step of initialization.
I'd like to draw attention to the case of 80-bit long double on x86. When I added support for it in Boost.Atomic I noticed that it would usually be passed in an xmm register, where the lower 10 bytes contained value and the upper 6 contained undefined padding. Given that gcc stores and loads the full xmm register, this means that clearing the storage prior to storing the value is not enough (the random padding will be stored in the storage and break cmpxchg16b). In Boost.Atomic I had to clear the padding after storing the value, and this code is brittle because I have to know when long double value is 10 bytes (note that sizeof(long double) returns 16 on x86-64 and 12 on x86-32).
I don't know if anything was done about it in recent gcc versions. Maybe the compiler could provide an intrinsic to clear any possible padding bits of a type? That would be useful not only for long double, but for structs with padding bits, because it allows to use memcpy to copy the value (which is arguably more efficient) and only clear padding when accepting the value from the user.
Another use case is C++20 atomic_ref, which may be bound to an object whose padding bits are in indeterminate state. An intrinsic to clear padding bits without altering the object value could be useful.
As discussed in bug #93916, the approach of zeroing the storage before constructing the object with internal padding doesn't work and is not required to work by the C++ standard.
(In reply to andysem from comment #2)
> Another use case is C++20 atomic_ref, which may be bound to an object whose
> padding bits are in indeterminate state. An intrinsic to clear padding bits
> without altering the object value could be useful.
Having now implemented atomic<T>::wait for libstdc++, I think the intrinsic to clear padding bits before calling __builtin_memcmp for generic (trivially copyable) T's is the right approach.
Created attachment 49563 [details]
Here is completely untested WIP of __builtin_clear_padding builtin, so far doesn't handle bit-fields, unions, VLAs and has a couple of other FIXMEs.
I'll try to complete this virtually from Baker Island (AoE timezone) tonight before stage1 closes there. Also I'll probably need to remember the originally passed pointer type in e.g. second artificial argument of the builtin (NULL), in case already before or during lower pass something would forward propagate the argument.
Created attachment 49565 [details]
Fixed/updated patch that includes first testcase and passes it.
Created attachment 49567 [details]
Updated patch that canhandle bit-fields on both little end big endian and can handle also skipping of large paddings. Next task unions.
Created attachment 49569 [details]
Updated patch to handle unions.
The master branch has been updated by Jakub Jelinek <email@example.com>:
Author: Jakub Jelinek <firstname.lastname@example.org>
Date: Fri Nov 20 12:28:34 2020 +0100
c++: Add __builtin_clear_padding builtin - C++20 P0528R3 compiler side [PR88101]
The following patch implements __builtin_clear_padding builtin that clears
the padding bits in object representation (but preserves value
representation). Inside of unions it clears only those padding bits that
are padding for all the union members (so that it never alters value
It handles trailing padding, padding in the middle of structs including
bitfields (PDP11 unhandled, I've never figured out how those bitfields
work), VLAs (doesn't handle variable length structures, but I think almost
nobody uses them and it isn't worth the extra complexity). For VLAs and
sufficiently large arrays it uses runtime clearing loop instead of emitting
straight-line code (unless arrays are inside of a union).
The way I think this can be used for atomics is e.g. if the structures
are power of two sized and small enough that we use the hw atomics
for say compare_exchange __builtin_clear_padding could be called first on
the address of expected and desired arguments (for desired only if we want
to ensure that most of the time the atomic memory will have padding bits
cleared), then perform the weak cmpxchg and if that fails, we got the
value from the atomic memory; we can call __builtin_clear_padding on a copy
of that and then compare it with expected, and if it is the same with the
padding bits masked off, we can use the original with whatever random
padding bits in it as the new expected for next cmpxchg.
__builtin_clear_padding itself is not atomic and therefore it shouldn't
be called on the atomic memory itself, but compare_exchange*'s expected
argument is a reference and normally the implementation may store there
the current value from memory, so padding bits can be cleared in that,
and desired is passed by value rather than reference, so clearing is fine
When using libatomic, we can use it either that way, or add new libatomic
APIs that accept another argument, pointer to the padding bit bitmask,
and construct that in the template as
alignas (_T) unsigned char _mask[sizeof (_T)];
std::memset (_mask, ~0, sizeof (_mask));
__builtin_clear_padding ((_T *) _mask);
which will have bits cleared for padding bits and set for bits taking part
in the value representation. Then libatomic could internally instead
of using memcmp compare
for (i = 0; i < N; i++) if ((val1[i] & mask[i]) != (val2[i] & mask[i]))
2020-11-20 Jakub Jelinek <email@example.com>
* builtins.def (BUILT_IN_CLEAR_PADDING): New built-in function.
* gimplify.c (gimplify_call_expr): Rewrite single argument
BUILT_IN_CLEAR_PADDING into two-argument variant.
* gimple-fold.c (clear_padding_unit, clear_padding_buf_size): New
(struct clear_padding_struct): New type.
gimple_fold_builtin_clear_padding): New functions.
(gimple_fold_builtin): Handle BUILT_IN_CLEAR_PADDING.
* doc/extend.texi (__builtin_clear_padding): Document.
* c-common.c (check_builtin_function_arguments): Handle
* c-c++-common/builtin-clear-padding-1.c: New test.
* c-c++-common/torture/builtin-clear-padding-1.c: New test.
* c-c++-common/torture/builtin-clear-padding-2.c: New test.
* c-c++-common/torture/builtin-clear-padding-3.c: New test.
* c-c++-common/torture/builtin-clear-padding-4.c: New test.
* c-c++-common/torture/builtin-clear-padding-5.c: New test.
* g++.dg/torture/builtin-clear-padding-1.C: New test.
* g++.dg/torture/builtin-clear-padding-2.C: New test.
* gcc.dg/builtin-clear-padding-1.c: New test.