Although this paper was moved by Core at the meeting, it's a change to the library atomics clause. Do you need compiler support for this? It seems fairly straightforward to handle types for which has_unique_object_representations is false by zeroing the storage as the first step of initialization.
I'd like to draw attention to the case of 80-bit long double on x86. When I added support for it in Boost.Atomic I noticed that it would usually be passed in an xmm register, where the lower 10 bytes contained value and the upper 6 contained undefined padding. Given that gcc stores and loads the full xmm register, this means that clearing the storage prior to storing the value is not enough (the random padding will be stored in the storage and break cmpxchg16b). In Boost.Atomic I had to clear the padding after storing the value, and this code is brittle because I have to know when long double value is 10 bytes (note that sizeof(long double) returns 16 on x86-64 and 12 on x86-32). I don't know if anything was done about it in recent gcc versions. Maybe the compiler could provide an intrinsic to clear any possible padding bits of a type? That would be useful not only for long double, but for structs with padding bits, because it allows to use memcpy to copy the value (which is arguably more efficient) and only clear padding when accepting the value from the user.
Another use case is C++20 atomic_ref, which may be bound to an object whose padding bits are in indeterminate state. An intrinsic to clear padding bits without altering the object value could be useful.
As discussed in bug #93916, the approach of zeroing the storage before constructing the object with internal padding doesn't work and is not required to work by the C++ standard.
(In reply to andysem from comment #2) > Another use case is C++20 atomic_ref, which may be bound to an object whose > padding bits are in indeterminate state. An intrinsic to clear padding bits > without altering the object value could be useful. Having now implemented atomic<T>::wait for libstdc++, I think the intrinsic to clear padding bits before calling __builtin_memcmp for generic (trivially copyable) T's is the right approach.
Created attachment 49563 [details] gcc11-pr88101-wip.patch Here is completely untested WIP of __builtin_clear_padding builtin, so far doesn't handle bit-fields, unions, VLAs and has a couple of other FIXMEs. I'll try to complete this virtually from Baker Island (AoE timezone) tonight before stage1 closes there. Also I'll probably need to remember the originally passed pointer type in e.g. second artificial argument of the builtin (NULL), in case already before or during lower pass something would forward propagate the argument.
Created attachment 49565 [details] gcc11-pr88101-wip.patch Fixed/updated patch that includes first testcase and passes it.
Created attachment 49567 [details] gcc11-pr88101-wip.patch Updated patch that canhandle bit-fields on both little end big endian and can handle also skipping of large paddings. Next task unions.
Created attachment 49569 [details] gcc11-pr88101-wip.patch Updated patch to handle unions.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:1bea0d0aa5936cb36b6f86f721ca03c1a1bb601d commit r11-5196-g1bea0d0aa5936cb36b6f86f721ca03c1a1bb601d Author: Jakub Jelinek <jakub@redhat.com> Date: Fri Nov 20 12:28:34 2020 +0100 c++: Add __builtin_clear_padding builtin - C++20 P0528R3 compiler side [PR88101] The following patch implements __builtin_clear_padding builtin that clears the padding bits in object representation (but preserves value representation). Inside of unions it clears only those padding bits that are padding for all the union members (so that it never alters value representation). It handles trailing padding, padding in the middle of structs including bitfields (PDP11 unhandled, I've never figured out how those bitfields work), VLAs (doesn't handle variable length structures, but I think almost nobody uses them and it isn't worth the extra complexity). For VLAs and sufficiently large arrays it uses runtime clearing loop instead of emitting straight-line code (unless arrays are inside of a union). The way I think this can be used for atomics is e.g. if the structures are power of two sized and small enough that we use the hw atomics for say compare_exchange __builtin_clear_padding could be called first on the address of expected and desired arguments (for desired only if we want to ensure that most of the time the atomic memory will have padding bits cleared), then perform the weak cmpxchg and if that fails, we got the value from the atomic memory; we can call __builtin_clear_padding on a copy of that and then compare it with expected, and if it is the same with the padding bits masked off, we can use the original with whatever random padding bits in it as the new expected for next cmpxchg. __builtin_clear_padding itself is not atomic and therefore it shouldn't be called on the atomic memory itself, but compare_exchange*'s expected argument is a reference and normally the implementation may store there the current value from memory, so padding bits can be cleared in that, and desired is passed by value rather than reference, so clearing is fine too. When using libatomic, we can use it either that way, or add new libatomic APIs that accept another argument, pointer to the padding bit bitmask, and construct that in the template as alignas (_T) unsigned char _mask[sizeof (_T)]; std::memset (_mask, ~0, sizeof (_mask)); __builtin_clear_padding ((_T *) _mask); which will have bits cleared for padding bits and set for bits taking part in the value representation. Then libatomic could internally instead of using memcmp compare for (i = 0; i < N; i++) if ((val1[i] & mask[i]) != (val2[i] & mask[i])) 2020-11-20 Jakub Jelinek <jakub@redhat.com> PR libstdc++/88101 gcc/ * builtins.def (BUILT_IN_CLEAR_PADDING): New built-in function. * gimplify.c (gimplify_call_expr): Rewrite single argument BUILT_IN_CLEAR_PADDING into two-argument variant. * gimple-fold.c (clear_padding_unit, clear_padding_buf_size): New const variables. (struct clear_padding_struct): New type. (clear_padding_flush, clear_padding_add_padding, clear_padding_emit_loop, clear_padding_type, clear_padding_union, clear_padding_real_needs_padding_p, clear_padding_type_may_have_padding_p, gimple_fold_builtin_clear_padding): New functions. (gimple_fold_builtin): Handle BUILT_IN_CLEAR_PADDING. * doc/extend.texi (__builtin_clear_padding): Document. gcc/c-family/ * c-common.c (check_builtin_function_arguments): Handle BUILT_IN_CLEAR_PADDING. gcc/testsuite/ * c-c++-common/builtin-clear-padding-1.c: New test. * c-c++-common/torture/builtin-clear-padding-1.c: New test. * c-c++-common/torture/builtin-clear-padding-2.c: New test. * c-c++-common/torture/builtin-clear-padding-3.c: New test. * c-c++-common/torture/builtin-clear-padding-4.c: New test. * c-c++-common/torture/builtin-clear-padding-5.c: New test. * g++.dg/torture/builtin-clear-padding-1.C: New test. * g++.dg/torture/builtin-clear-padding-2.C: New test. * gcc.dg/builtin-clear-padding-1.c: New test.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:bf0a63a1f47525d1c466dbb84616dcb72010affa commit r11-5490-gbf0a63a1f47525d1c466dbb84616dcb72010affa Author: Jakub Jelinek <jakub@redhat.com> Date: Fri Nov 27 11:23:45 2020 +0100 gimple-fold: Fix another __builtin_clear_padding ICE When playing with __builtin_bit_cast, I have noticed __builtin_clear_padding ICE on the G class below. The artificial field with D type has offset 0 and size 8 bytes, but the following artificial field with E type has offset 0 and size 0, so it triggers the asserts that we don't move current position backwards. Fixed by ignoring is_empty_type (TREE_TYPE (field)) fields, all of their bits are padding which is what is added when skipping over to next field anyway. 2020-11-27 Jakub Jelinek <jakub@redhat.com> PR libstdc++/88101 * gimple-fold.c (clear_padding_type): Ignore fields with is_empty_type types. * g++.dg/torture/builtin-clear-padding-3.C: New test.
Any remaining work here?
The compiler side is done, but the libstdc++-v3 side is not on the trunk yet.
Library patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601218.html
With r13-2548-g157236dbd62164 the library side is done, so this is complete.
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>: https://gcc.gnu.org/g:e217e7dbdc1040e7ee160796e9ca1ef12a0dd1cb commit r15-2136-ge217e7dbdc1040e7ee160796e9ca1ef12a0dd1cb Author: Sam James <sam@gentoo.org> Date: Thu Jul 18 10:00:17 2024 +0200 testsuite: Add dg-do run to more tests All of these are for wrong-code bugs. Confirmed to be used before but with no execution. 2024-07-18 Sam James <sam@gentoo.org> PR c++/53288 PR c++/57437 PR c/65345 PR libstdc++/88101 PR tree-optimization/96369 PR tree-optimization/102124 PR tree-optimization/108692 * c-c++-common/pr96369.c: Add dg-do run directive. * gcc.dg/torture/pr102124.c: Ditto. * gcc.dg/pr108692.c: Ditto. * gcc.dg/atomic/pr65345-4.c: Ditto. * g++.dg/cpp0x/lambda/lambda-return1.C: Ditto. * g++.dg/init/lifetime4.C: Ditto. * g++.dg/torture/builtin-clear-padding-1.C: Ditto. * g++.dg/torture/builtin-clear-padding-2.C: Ditto. * g++.dg/torture/builtin-clear-padding-3.C: Ditto. * g++.dg/torture/builtin-clear-padding-4.C: Ditto. * g++.dg/torture/builtin-clear-padding-5.C: Ditto.