Bug 88101 - Implement P0528R3, C++20 cmpxchg and padding bits
Summary: Implement P0528R3, C++20 cmpxchg and padding bits
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: unknown
: P3 enhancement
Target Milestone: ---
Assignee: Jakub Jelinek
URL:
Keywords:
Depends on:
Blocks: 88322 c++20-core
  Show dependency treegraph
 
Reported: 2018-11-19 19:39 UTC by Jason Merrill
Modified: 2020-11-20 11:31 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-12-03 00:00:00


Attachments
gcc11-pr88101-wip.patch (4.14 KB, patch)
2020-11-15 17:30 UTC, Jakub Jelinek
Details | Diff
gcc11-pr88101-wip.patch (4.72 KB, patch)
2020-11-15 18:56 UTC, Jakub Jelinek
Details | Diff
gcc11-pr88101-wip.patch (5.87 KB, patch)
2020-11-16 09:37 UTC, Jakub Jelinek
Details | Diff
gcc11-pr88101-wip.patch (6.57 KB, patch)
2020-11-16 11:05 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Merrill 2018-11-19 19:39:40 UTC
Although this paper was moved by Core at the meeting, it's a change to the library atomics clause.  Do you need compiler support for this?  It seems fairly straightforward to handle types for which has_unique_object_representations is false by zeroing the storage as the first step of initialization.
Comment 1 andysem 2019-12-06 10:25:29 UTC
I'd like to draw attention to the case of 80-bit long double on x86. When I added support for it in Boost.Atomic I noticed that it would usually be passed in an xmm register, where the lower 10 bytes contained value and the upper 6 contained undefined padding. Given that gcc stores and loads the full xmm register, this means that clearing the storage prior to storing the value is not enough (the random padding will be stored in the storage and break cmpxchg16b). In Boost.Atomic I had to clear the padding after storing the value, and this code is brittle because I have to know when long double value is 10 bytes (note that sizeof(long double) returns 16 on x86-64 and 12 on x86-32).

I don't know if anything was done about it in recent gcc versions. Maybe the compiler could provide an intrinsic to clear any possible padding bits of a type? That would be useful not only for long double, but for structs with padding bits, because it allows to use memcpy to copy the value (which is arguably more efficient) and only clear padding when accepting the value from the user.
Comment 2 andysem 2020-02-23 16:30:38 UTC
Another use case is C++20 atomic_ref, which may be bound to an object whose padding bits are in indeterminate state. An intrinsic to clear padding bits without altering the object value could be useful.
Comment 3 andysem 2020-02-25 11:32:36 UTC
As discussed in bug #93916, the approach of zeroing the storage before constructing the object with internal padding doesn't work and is not required to work by the C++ standard.
Comment 4 Thomas Rodgers 2020-09-15 20:03:58 UTC
(In reply to andysem from comment #2)
> Another use case is C++20 atomic_ref, which may be bound to an object whose
> padding bits are in indeterminate state. An intrinsic to clear padding bits
> without altering the object value could be useful.

Having now implemented atomic<T>::wait for libstdc++, I think the intrinsic to clear padding bits before calling __builtin_memcmp for generic (trivially copyable) T's is the right approach.
Comment 5 Jakub Jelinek 2020-11-15 17:30:48 UTC
Created attachment 49563 [details]
gcc11-pr88101-wip.patch

Here is completely untested WIP of __builtin_clear_padding builtin, so far doesn't handle bit-fields, unions, VLAs and has a couple of other FIXMEs.
I'll try to complete this virtually from Baker Island (AoE timezone) tonight before stage1 closes there.  Also I'll probably need to remember the originally passed pointer type in e.g. second artificial argument of the builtin (NULL), in case already before or during lower pass something would forward propagate the argument.
Comment 6 Jakub Jelinek 2020-11-15 18:56:54 UTC
Created attachment 49565 [details]
gcc11-pr88101-wip.patch

Fixed/updated patch that includes first testcase and passes it.
Comment 7 Jakub Jelinek 2020-11-16 09:37:09 UTC
Created attachment 49567 [details]
gcc11-pr88101-wip.patch

Updated patch that canhandle bit-fields on both little end big endian and can handle also skipping of large paddings.  Next task unions.
Comment 8 Jakub Jelinek 2020-11-16 11:05:27 UTC
Created attachment 49569 [details]
gcc11-pr88101-wip.patch

Updated patch to handle unions.
Comment 9 CVS Commits 2020-11-20 11:31:48 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:1bea0d0aa5936cb36b6f86f721ca03c1a1bb601d

commit r11-5196-g1bea0d0aa5936cb36b6f86f721ca03c1a1bb601d
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Nov 20 12:28:34 2020 +0100

    c++: Add __builtin_clear_padding builtin - C++20 P0528R3 compiler side [PR88101]
    
    The following patch implements __builtin_clear_padding builtin that clears
    the padding bits in object representation (but preserves value
    representation).  Inside of unions it clears only those padding bits that
    are padding for all the union members (so that it never alters value
    representation).
    
    It handles trailing padding, padding in the middle of structs including
    bitfields (PDP11 unhandled, I've never figured out how those bitfields
    work), VLAs (doesn't handle variable length structures, but I think almost
    nobody uses them and it isn't worth the extra complexity).  For VLAs and
    sufficiently large arrays it uses runtime clearing loop instead of emitting
    straight-line code (unless arrays are inside of a union).
    
    The way I think this can be used for atomics is e.g. if the structures
    are power of two sized and small enough that we use the hw atomics
    for say compare_exchange __builtin_clear_padding could be called first on
    the address of expected and desired arguments (for desired only if we want
    to ensure that most of the time the atomic memory will have padding bits
    cleared), then perform the weak cmpxchg and if that fails, we got the
    value from the atomic memory; we can call __builtin_clear_padding on a copy
    of that and then compare it with expected, and if it is the same with the
    padding bits masked off, we can use the original with whatever random
    padding bits in it as the new expected for next cmpxchg.
    __builtin_clear_padding itself is not atomic and therefore it shouldn't
    be called on the atomic memory itself, but compare_exchange*'s expected
    argument is a reference and normally the implementation may store there
    the current value from memory, so padding bits can be cleared in that,
    and desired is passed by value rather than reference, so clearing is fine
    too.
    When using libatomic, we can use it either that way, or add new libatomic
    APIs that accept another argument, pointer to the padding bit bitmask,
    and construct that in the template as
      alignas (_T) unsigned char _mask[sizeof (_T)];
      std::memset (_mask, ~0, sizeof (_mask));
      __builtin_clear_padding ((_T *) _mask);
    which will have bits cleared for padding bits and set for bits taking part
    in the value representation.  Then libatomic could internally instead
    of using memcmp compare
    for (i = 0; i < N; i++) if ((val1[i] & mask[i]) != (val2[i] & mask[i]))
    
    2020-11-20  Jakub Jelinek  <jakub@redhat.com>
    
            PR libstdc++/88101
    gcc/
            * builtins.def (BUILT_IN_CLEAR_PADDING): New built-in function.
            * gimplify.c (gimplify_call_expr): Rewrite single argument
            BUILT_IN_CLEAR_PADDING into two-argument variant.
            * gimple-fold.c (clear_padding_unit, clear_padding_buf_size): New
            const variables.
            (struct clear_padding_struct): New type.
            (clear_padding_flush, clear_padding_add_padding,
            clear_padding_emit_loop, clear_padding_type,
            clear_padding_union, clear_padding_real_needs_padding_p,
            clear_padding_type_may_have_padding_p,
            gimple_fold_builtin_clear_padding): New functions.
            (gimple_fold_builtin): Handle BUILT_IN_CLEAR_PADDING.
            * doc/extend.texi (__builtin_clear_padding): Document.
    gcc/c-family/
            * c-common.c (check_builtin_function_arguments): Handle
            BUILT_IN_CLEAR_PADDING.
    gcc/testsuite/
            * c-c++-common/builtin-clear-padding-1.c: New test.
            * c-c++-common/torture/builtin-clear-padding-1.c: New test.
            * c-c++-common/torture/builtin-clear-padding-2.c: New test.
            * c-c++-common/torture/builtin-clear-padding-3.c: New test.
            * c-c++-common/torture/builtin-clear-padding-4.c: New test.
            * c-c++-common/torture/builtin-clear-padding-5.c: New test.
            * g++.dg/torture/builtin-clear-padding-1.C: New test.
            * g++.dg/torture/builtin-clear-padding-2.C: New test.
            * gcc.dg/builtin-clear-padding-1.c: New test.