Bug 88101 - Implement P0528R3, C++20 cmpxchg and padding bits
Summary: Implement P0528R3, C++20 cmpxchg and padding bits
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: unknown
: P3 enhancement
Target Milestone: 13.0
Assignee: Jakub Jelinek
URL:
Keywords: patch
Depends on:
Blocks: c++20-core c++20-lib
  Show dependency treegraph
 
Reported: 2018-11-19 19:39 UTC by Jason Merrill
Modified: 2024-07-18 08:07 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-12-03 00:00:00


Attachments
gcc11-pr88101-wip.patch (4.14 KB, patch)
2020-11-15 17:30 UTC, Jakub Jelinek
Details | Diff
gcc11-pr88101-wip.patch (4.72 KB, patch)
2020-11-15 18:56 UTC, Jakub Jelinek
Details | Diff
gcc11-pr88101-wip.patch (5.87 KB, patch)
2020-11-16 09:37 UTC, Jakub Jelinek
Details | Diff
gcc11-pr88101-wip.patch (6.57 KB, patch)
2020-11-16 11:05 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Merrill 2018-11-19 19:39:40 UTC
Although this paper was moved by Core at the meeting, it's a change to the library atomics clause.  Do you need compiler support for this?  It seems fairly straightforward to handle types for which has_unique_object_representations is false by zeroing the storage as the first step of initialization.
Comment 1 andysem 2019-12-06 10:25:29 UTC
I'd like to draw attention to the case of 80-bit long double on x86. When I added support for it in Boost.Atomic I noticed that it would usually be passed in an xmm register, where the lower 10 bytes contained value and the upper 6 contained undefined padding. Given that gcc stores and loads the full xmm register, this means that clearing the storage prior to storing the value is not enough (the random padding will be stored in the storage and break cmpxchg16b). In Boost.Atomic I had to clear the padding after storing the value, and this code is brittle because I have to know when long double value is 10 bytes (note that sizeof(long double) returns 16 on x86-64 and 12 on x86-32).

I don't know if anything was done about it in recent gcc versions. Maybe the compiler could provide an intrinsic to clear any possible padding bits of a type? That would be useful not only for long double, but for structs with padding bits, because it allows to use memcpy to copy the value (which is arguably more efficient) and only clear padding when accepting the value from the user.
Comment 2 andysem 2020-02-23 16:30:38 UTC
Another use case is C++20 atomic_ref, which may be bound to an object whose padding bits are in indeterminate state. An intrinsic to clear padding bits without altering the object value could be useful.
Comment 3 andysem 2020-02-25 11:32:36 UTC
As discussed in bug #93916, the approach of zeroing the storage before constructing the object with internal padding doesn't work and is not required to work by the C++ standard.
Comment 4 Thomas Rodgers 2020-09-15 20:03:58 UTC
(In reply to andysem from comment #2)
> Another use case is C++20 atomic_ref, which may be bound to an object whose
> padding bits are in indeterminate state. An intrinsic to clear padding bits
> without altering the object value could be useful.

Having now implemented atomic<T>::wait for libstdc++, I think the intrinsic to clear padding bits before calling __builtin_memcmp for generic (trivially copyable) T's is the right approach.
Comment 5 Jakub Jelinek 2020-11-15 17:30:48 UTC
Created attachment 49563 [details]
gcc11-pr88101-wip.patch

Here is completely untested WIP of __builtin_clear_padding builtin, so far doesn't handle bit-fields, unions, VLAs and has a couple of other FIXMEs.
I'll try to complete this virtually from Baker Island (AoE timezone) tonight before stage1 closes there.  Also I'll probably need to remember the originally passed pointer type in e.g. second artificial argument of the builtin (NULL), in case already before or during lower pass something would forward propagate the argument.
Comment 6 Jakub Jelinek 2020-11-15 18:56:54 UTC
Created attachment 49565 [details]
gcc11-pr88101-wip.patch

Fixed/updated patch that includes first testcase and passes it.
Comment 7 Jakub Jelinek 2020-11-16 09:37:09 UTC
Created attachment 49567 [details]
gcc11-pr88101-wip.patch

Updated patch that canhandle bit-fields on both little end big endian and can handle also skipping of large paddings.  Next task unions.
Comment 8 Jakub Jelinek 2020-11-16 11:05:27 UTC
Created attachment 49569 [details]
gcc11-pr88101-wip.patch

Updated patch to handle unions.
Comment 9 GCC Commits 2020-11-20 11:31:48 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:1bea0d0aa5936cb36b6f86f721ca03c1a1bb601d

commit r11-5196-g1bea0d0aa5936cb36b6f86f721ca03c1a1bb601d
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Nov 20 12:28:34 2020 +0100

    c++: Add __builtin_clear_padding builtin - C++20 P0528R3 compiler side [PR88101]
    
    The following patch implements __builtin_clear_padding builtin that clears
    the padding bits in object representation (but preserves value
    representation).  Inside of unions it clears only those padding bits that
    are padding for all the union members (so that it never alters value
    representation).
    
    It handles trailing padding, padding in the middle of structs including
    bitfields (PDP11 unhandled, I've never figured out how those bitfields
    work), VLAs (doesn't handle variable length structures, but I think almost
    nobody uses them and it isn't worth the extra complexity).  For VLAs and
    sufficiently large arrays it uses runtime clearing loop instead of emitting
    straight-line code (unless arrays are inside of a union).
    
    The way I think this can be used for atomics is e.g. if the structures
    are power of two sized and small enough that we use the hw atomics
    for say compare_exchange __builtin_clear_padding could be called first on
    the address of expected and desired arguments (for desired only if we want
    to ensure that most of the time the atomic memory will have padding bits
    cleared), then perform the weak cmpxchg and if that fails, we got the
    value from the atomic memory; we can call __builtin_clear_padding on a copy
    of that and then compare it with expected, and if it is the same with the
    padding bits masked off, we can use the original with whatever random
    padding bits in it as the new expected for next cmpxchg.
    __builtin_clear_padding itself is not atomic and therefore it shouldn't
    be called on the atomic memory itself, but compare_exchange*'s expected
    argument is a reference and normally the implementation may store there
    the current value from memory, so padding bits can be cleared in that,
    and desired is passed by value rather than reference, so clearing is fine
    too.
    When using libatomic, we can use it either that way, or add new libatomic
    APIs that accept another argument, pointer to the padding bit bitmask,
    and construct that in the template as
      alignas (_T) unsigned char _mask[sizeof (_T)];
      std::memset (_mask, ~0, sizeof (_mask));
      __builtin_clear_padding ((_T *) _mask);
    which will have bits cleared for padding bits and set for bits taking part
    in the value representation.  Then libatomic could internally instead
    of using memcmp compare
    for (i = 0; i < N; i++) if ((val1[i] & mask[i]) != (val2[i] & mask[i]))
    
    2020-11-20  Jakub Jelinek  <jakub@redhat.com>
    
            PR libstdc++/88101
    gcc/
            * builtins.def (BUILT_IN_CLEAR_PADDING): New built-in function.
            * gimplify.c (gimplify_call_expr): Rewrite single argument
            BUILT_IN_CLEAR_PADDING into two-argument variant.
            * gimple-fold.c (clear_padding_unit, clear_padding_buf_size): New
            const variables.
            (struct clear_padding_struct): New type.
            (clear_padding_flush, clear_padding_add_padding,
            clear_padding_emit_loop, clear_padding_type,
            clear_padding_union, clear_padding_real_needs_padding_p,
            clear_padding_type_may_have_padding_p,
            gimple_fold_builtin_clear_padding): New functions.
            (gimple_fold_builtin): Handle BUILT_IN_CLEAR_PADDING.
            * doc/extend.texi (__builtin_clear_padding): Document.
    gcc/c-family/
            * c-common.c (check_builtin_function_arguments): Handle
            BUILT_IN_CLEAR_PADDING.
    gcc/testsuite/
            * c-c++-common/builtin-clear-padding-1.c: New test.
            * c-c++-common/torture/builtin-clear-padding-1.c: New test.
            * c-c++-common/torture/builtin-clear-padding-2.c: New test.
            * c-c++-common/torture/builtin-clear-padding-3.c: New test.
            * c-c++-common/torture/builtin-clear-padding-4.c: New test.
            * c-c++-common/torture/builtin-clear-padding-5.c: New test.
            * g++.dg/torture/builtin-clear-padding-1.C: New test.
            * g++.dg/torture/builtin-clear-padding-2.C: New test.
            * gcc.dg/builtin-clear-padding-1.c: New test.
Comment 10 GCC Commits 2020-11-27 10:25:34 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:bf0a63a1f47525d1c466dbb84616dcb72010affa

commit r11-5490-gbf0a63a1f47525d1c466dbb84616dcb72010affa
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Nov 27 11:23:45 2020 +0100

    gimple-fold: Fix another __builtin_clear_padding ICE
    
    When playing with __builtin_bit_cast, I have noticed __builtin_clear_padding
    ICE on the G class below.  The artificial field with D type has offset 0
    and size 8 bytes, but the following artificial field with E type has offset
    0 and size 0, so it triggers the asserts that we don't move current position
    backwards.  Fixed by ignoring is_empty_type (TREE_TYPE (field)) fields, all
    of their bits are padding which is what is added when skipping over to next
    field anyway.
    
    2020-11-27  Jakub Jelinek  <jakub@redhat.com>
    
            PR libstdc++/88101
            * gimple-fold.c (clear_padding_type): Ignore fields with is_empty_type
            types.
    
            * g++.dg/torture/builtin-clear-padding-3.C: New test.
Comment 11 Marek Polacek 2021-02-02 19:45:58 UTC
Any remaining work here?
Comment 12 Jakub Jelinek 2021-02-02 19:48:13 UTC
The compiler side is done, but the libstdc++-v3 side is not on the trunk yet.
Comment 13 Jonathan Wakely 2022-09-07 12:22:35 UTC
Library patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601218.html
Comment 14 Jonathan Wakely 2022-09-08 18:37:51 UTC
With r13-2548-g157236dbd62164 the library side is done, so this is complete.
Comment 15 GCC Commits 2024-07-18 08:07:03 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:e217e7dbdc1040e7ee160796e9ca1ef12a0dd1cb

commit r15-2136-ge217e7dbdc1040e7ee160796e9ca1ef12a0dd1cb
Author: Sam James <sam@gentoo.org>
Date:   Thu Jul 18 10:00:17 2024 +0200

    testsuite: Add dg-do run to more tests
    
    All of these are for wrong-code bugs.  Confirmed to be used before but
    with no execution.
    
    2024-07-18  Sam James  <sam@gentoo.org>
    
            PR c++/53288
            PR c++/57437
            PR c/65345
            PR libstdc++/88101
            PR tree-optimization/96369
            PR tree-optimization/102124
            PR tree-optimization/108692
            * c-c++-common/pr96369.c: Add dg-do run directive.
            * gcc.dg/torture/pr102124.c: Ditto.
            * gcc.dg/pr108692.c: Ditto.
            * gcc.dg/atomic/pr65345-4.c: Ditto.
            * g++.dg/cpp0x/lambda/lambda-return1.C: Ditto.
            * g++.dg/init/lifetime4.C: Ditto.
            * g++.dg/torture/builtin-clear-padding-1.C: Ditto.
            * g++.dg/torture/builtin-clear-padding-2.C: Ditto.
            * g++.dg/torture/builtin-clear-padding-3.C: Ditto.
            * g++.dg/torture/builtin-clear-padding-4.C: Ditto.
            * g++.dg/torture/builtin-clear-padding-5.C: Ditto.