// https://gcc.godbolt.org/z/EfdG4nzv9 #include <algorithm> #include <cassert> struct B { B(int i, short j) : i(i), j(j) {} int i; short j; /* 2 byte padding */ }; struct D : B { D(int i, short j, short x) : B(i, j), x(x) {} short x; }; int main() { D ddst(1, 2, 3); D dsrc(4, 5, 6); B *dst = &ddst; B *src = &dsrc; std::copy_n(src, 1, dst); // should do `*dst = *src` assert(ddst.x == 3); // FAILS } Similarly if you use `std::copy(src, src+1, dst)`. The problem here is that `std::copy` and `std::copy_n` (and presumably some other algorithms too, like `std::move`) believe that if a type is trivially copyable, then it's safe to use `memcpy` or `memmove` on it. But in fact that's safe only if either - the type's "sizeof" is equal to its "data size, without trailing padding"; or - you happen to have external information proving that the object being copied is not a potentially overlapping subobject. I think it's safe for `std::copy` and `std::copy_n` to use memcpy/memmove if they are given a destination range of 2-or-more elements; but if it's just a single object, then (by the letter of the law) they must assume the destination object might be a potentially overlapping subobject, and not memcpy into it. (Full disclosure: I will be THRILLED if you close this as "not a bug," because that will be ammunition to go to LWG and say "look, vendors think this behavior is fine, we should actually standardize this behavior and remove the expectation that STL algorithms can ever handle potentially overlapping subobjects.")
The memcpy/memmove change was done in r9-1943-g20a0c4e3dc9948
Related to https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2403 but instead of language, it is the library methods.
Even https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2544 is very much related.
(In reply to Andrew Pinski from comment #2) > Related to https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2403 > but instead of language, it is the library methods. CWG2403 looks like an unfortunate result of ABI design. If any non-padding byte of a potentially-overlapping subobject X is actually overlapping with a padding byte of subobject Y, Y should be initialized before X, because all bytes of Y may be written in initialization. However, on Itanium ABI virtual base class subobjects violate such intuitive rule. But I think it's only closedly related to uninitialized memory algorithms (which perform initialization instead of assignment), and there's already a note saying that it's UB to use them to initialize potentially-overlapping objects ([specialized.algorithms.general]/3). (In reply to Andrew Pinski from comment #3) > Even https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2544 is > very much related. (As the submitter) I think this issue only legitimates current implementation of pointer arithmetic. For example, assuming both sizeof(B) and sizeof(D) are 8, which means the last 2 bytes of B are padding (true for common implementations on Itanium ABI): struct B { int32_t x; int16_t y; }; struct D : B { int16_t z; }; The current wording in [basic.compound]/3 requires that given `B b{};` and `D d{};`, `static_cast<B*>(&d) + 1` represents the address of the byte at offset 6, which is almost unimplementable. However, subtraction between pointers are performed as usual for even for potentially-overlapping subobjects, which is correctly implemented. In the case of copy family algorithms, I believe it's OK to specially handle cases where last - first == 1. BTW, there's CWG2527 that is closed as NAD, which seemingly means that an array subobject can be potentially-overlapping, even if element subobjects can't.
> In the case of copy family algorithms, I believe it's OK to specially handle cases where last - first == 1. https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_algobase.h#L417-L437 Is the extent of the fix just to add another branch to the if (_Num) check here? I've just tried locally, and it seems to work. It also doesn't seem that libstdc++ uses memcpy/memmove elsewhere (e.g. in std::fill).
> For example, assuming both sizeof(B) and sizeof(D) are 8, which means the last > 2 bytes of B are padding (true for common implementations on Itanium ABI): Oh, I forgot the strange design caused by CWG43 - this requires some non-C++98-POD mechanism for B (e.g. an explicitly default default constructor)...
I suspect std::move has the same issue too. The ability to use memmove with trivial copyable subobjects ...
I don't think we need the ABI keyword here, the library can just avoid the optimization for distance == 1. That has no ABI impact.
(In reply to Giuseppe D'Angelo from comment #5) > https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/ > stl_algobase.h#L417-L437 > > Is the extent of the fix just to add another branch to the if (_Num) check > here? Yes, I think that's the simplest place to fix it. And we can even get rid of the static assertion, because the Num==1 branch will require assignability now. --- a/libstdc++-v3/include/bits/stl_algobase.h +++ b/libstdc++-v3/include/bits/stl_algobase.h @@ -737,16 +737,11 @@ _GLIBCXX_END_NAMESPACE_CONTAINER static _Tp* __copy_move_b(const _Tp* __first, const _Tp* __last, _Tp* __result) { -#if __cplusplus >= 201103L - using __assignable = __conditional_t<_IsMove, - is_move_assignable<_Tp>, - is_copy_assignable<_Tp>>; - // trivial types can have deleted assignment - static_assert( __assignable::value, "type must be assignable" ); -#endif const ptrdiff_t _Num = __last - __first; - if (_Num) + if (__builtin_expect(_Num > 1, true)) __builtin_memmove(__result - _Num, __first, sizeof(_Tp) * _Num); + else if (_Num == 1) + *__result = *__first; return __result - _Num; } };
(In reply to Andrew Pinski from comment #7) > I suspect std::move has the same issue too. The ability to use memmove with > trivial copyable subobjects ... std::move(x,y,z) and std::copy(z,y,z) use the same underlying implementation, so it does have the same issue, but will be fixed by the same change.
(In reply to Jonathan Wakely from comment #10) > std::move(x,y,z) and std::copy(z,y,z) use the same underlying > implementation, so it does have the same issue, but will be fixed by the > same change. Right; also std::rotate, std::set_union, etc., I'd expect would all be fixed by the same change to std::copy. You might want to peek at std::swap_ranges, though. Microsoft's STL has trouble there; I don't *think* libstdc++ does, but you might as well check me.
(In reply to Jonathan Wakely from comment #9) > (In reply to Giuseppe D'Angelo from comment #5) > > https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/ > > stl_algobase.h#L417-L437 > > > > Is the extent of the fix just to add another branch to the if (_Num) check > > here? > > Yes, I think that's the simplest place to fix it. > > And we can even get rid of the static assertion, because the Num==1 branch > will require assignability now. > > --- a/libstdc++-v3/include/bits/stl_algobase.h > +++ b/libstdc++-v3/include/bits/stl_algobase.h > @@ -737,16 +737,11 @@ _GLIBCXX_END_NAMESPACE_CONTAINER > static _Tp* > __copy_move_b(const _Tp* __first, const _Tp* __last, _Tp* __result) > { > -#if __cplusplus >= 201103L > - using __assignable = __conditional_t<_IsMove, > - is_move_assignable<_Tp>, > - is_copy_assignable<_Tp>>; > - // trivial types can have deleted assignment > - static_assert( __assignable::value, "type must be assignable" ); > -#endif > const ptrdiff_t _Num = __last - __first; > - if (_Num) > + if (__builtin_expect(_Num > 1, true)) > __builtin_memmove(__result - _Num, __first, sizeof(_Tp) * _Num); > + else if (_Num == 1) > + *__result = *__first; > return __result - _Num; > } > }; If this code path also takes care of std::move(b,e,o), then this doesn't sound correct -- for _Num==1 and _IsMove==true, then it should use a move assignment (std::move(*__first)). But then that std::move doesn't actually work as __first is a pointer to const...
That specialization is only used for trivial types, so moving is equivalent to copying (otherwise we couldn't use memmove anyway). So it doesn't make any difference whether you move an lvalue or rvalue, it's just a bitwise copy anyway.
s/move an lvalue or rvalue/assign from an lvalue or rvalue/
That's not what I meant; a type can be trivial(ly copyable) and move-only. Here's a modification of Arthur's example: // move-only struct B { B(int i, short j) : i(i), j(j) {} B(B &&) = default; B &operator=(B &&) = default; int i; short j; }; struct D : B { D(int i, short j, short x) : B(i, j), x(x) {} D(D &&) = default; D &operator=(D &&) = default; short x; }; int main() { D ddst(1, 2, 3); D dsrc(4, 5, 6); B *dst = &ddst; B *src = &dsrc; static_assert(std::is_trivially_copyable_v<B>); std::move(src, src+1, dst); assert(ddst.x == 3); } The call to std::move ends up in the same memmove codepath as std::copy_n (B is trivially copyable), but with the proposed patch it will fail to compile because it's not actually copy assignable.
Ah gotcha. So then something like this (untested): --- a/libstdc++-v3/include/bits/stl_algobase.h +++ b/libstdc++-v3/include/bits/stl_algobase.h @@ -422,16 +422,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static _Tp* __copy_m(const _Tp* __first, const _Tp* __last, _Tp* __result) { -#if __cplusplus >= 201103L - using __assignable = __conditional_t<_IsMove, - is_move_assignable<_Tp>, - is_copy_assignable<_Tp>>; - // trivial types can have deleted assignment - static_assert( __assignable::value, "type must be assignable" ); -#endif const ptrdiff_t _Num = __last - __first; - if (_Num) + if (__builtin_expect(_Num > 1, true)) __builtin_memmove(__result, __first, sizeof(_Tp) * _Num); + else if (_Num == 1) + return std::__copy_move<_IsMove, false, + random_access_iterator_tag>:: + __copy_m(__first, __last, __result); return __result + _Num; } }; @@ -737,16 +734,13 @@ _GLIBCXX_END_NAMESPACE_CONTAINER static _Tp* __copy_move_b(const _Tp* __first, const _Tp* __last, _Tp* __result) { -#if __cplusplus >= 201103L - using __assignable = __conditional_t<_IsMove, - is_move_assignable<_Tp>, - is_copy_assignable<_Tp>>; - // trivial types can have deleted assignment - static_assert( __assignable::value, "type must be assignable" ); -#endif const ptrdiff_t _Num = __last - __first; - if (_Num) + if (__builtin_expect(_Num > 1, true)) __builtin_memmove(__result - _Num, __first, sizeof(_Tp) * _Num); + else if (_Num == 1) + return std::__copy_move_backward<_IsMove, false, + random_access_iterator_tag>:: + __copy_move_b(__first, __last, __result); return __result - _Num; } };
We need to change the arguments too, as that specialization takes a const _Tp* as the input range.
Created attachment 54537 [details] libstdc++: Do not use memmove for 1-element ranges [PR108846] This is a more complete patch that passes Arthur's test in comment 0.
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>: https://gcc.gnu.org/g:822a11a1e642e0abe92a996e7033a5066905a447 commit r13-6372-g822a11a1e642e0abe92a996e7033a5066905a447 Author: Jonathan Wakely <jwakely@redhat.com> Date: Sat Feb 25 14:28:36 2023 +0000 libstdc++: Do not use memmove for 1-element ranges [PR108846] This avoids overwriting tail padding when algorithms like std::copy are used to write a single value through a pointer to a base subobject. The pointer arithmetic on a Base* is valid for N==1, but the copy/move operation needs to be done using assignment, not a memmove or memcpy of sizeof(Base) bytes. Instead of putting a check for N==1 in all of copy, copy_n, move etc. this adds it to the __copy_move and __copy_move_backward partial specializations used for trivially copyable types. When N==1 those partial specializations dispatch to new static member functions of the partial specializations for non-trivial types, so that a copy/move assignment is done appropriately for the _IsMove constant. libstdc++-v3/ChangeLog: PR libstdc++/108846 * include/bits/stl_algobase.h (__copy_move<false, false, RA>) Add __assign_one static member function. (__copy_move<true, false, RA>): Likewise. (__copy_move<IsMove, true, RA>): Do not use memmove for a single value. (__copy_move_backward<IsMove, true, RA>): Likewise. * testsuite/25_algorithms/copy/108846.cc: New test. * testsuite/25_algorithms/copy_backward/108846.cc: New test. * testsuite/25_algorithms/copy_n/108846.cc: New test. * testsuite/25_algorithms/move/108846.cc: New test. * testsuite/25_algorithms/move_backward/108846.cc: New test.
Thanks for the patch! Should ranges_algobase.h also be similarly changed? There's a memmove in its copy/move code as well: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/bits/ranges_algobase.h;hb=HEAD#l263
Indeed it should.
(In reply to Jonathan Wakely from comment #16) > const ptrdiff_t _Num = __last - __first; > - if (_Num) > + if (__builtin_expect(_Num > 1, true)) > __builtin_memmove(__result, __first, sizeof(_Tp) * _Num); > + else if (_Num == 1) Richi suggested that we could avoid these runtime branches (which hurt optimization, see PR 109445) if we knew how many bytes of tail padding there are in _Tp, e.g., __builtin_memmove(__result, __first, sizeof(_Tp) * _Num - __padding_bytes); This would mean we copy whole objects for the first Num - 1 elements, and then only copy the non-padding bytes for the last element. This would be conforming, because there's no guarantee that padding bits are copied by assignment anyway, and std::copy is specified in terms of assignment. When copying a single base-class object (the subject of this bug) the last element is the only element, so we solve the problem. We don't have a built-in to tell us the number of padding bytes, but we might be able to use something like this: template<typename T> constexpr size_t potentially_overlapping_tail_padding() { if constexpr (is_final<T>::value) return 0; else { struct D1 : T { char c1; }; #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Winvalid-offsetof" return sizeof(T) - offsetof(D1, c1); #pragma GCC diagnostic pop } } For pre-C++17 we would need to use tag dispatching and __is_final instead of if-constexpr, but that's doable. For pre-C++14 it can't be constexpr, but that's OK too.
(In reply to Jonathan Wakely from comment #22) > > Richi suggested that we could avoid these runtime branches (which hurt > optimization, see PR 109445) if we knew how many bytes of tail padding there > are in _Tp [...] > We don't have a built-in to tell us the number of [trailing] padding bytes, but we > might be able to use something like this Three thoughts that might be helpful: - There may be padding in the middle of an object, and I'm not confident that the Standard actually forbids it from being used. Of course your approach works fine on the Itanium ABI, and probably works everywhere that actually exists. If you've got chapter+verse proving that it must work *everywhere*, I'd appreciate seeing it, just for my own information. - If GCC were ever to add a builtin for this notion, IMO the proper name would be `__datasizeof(T)`. See https://danakj.github.io/2023/01/15/trivially-relocatable.html#data-size - You can implement your library trait like this; see https://quuxplusone.github.io/blog/2023/03/04/trivial-swap-prize-update/#step-1.-std-swap-can-t-assume-it template <class _Tp> struct __libcpp_datasizeof { struct _Up { [[__no_unique_address__]] _Tp __t_; char __c_; }; static const size_t value = __builtin_offsetof(_Up, __c_); }; Unfortunately it looks like GCC doesn't support `__attribute__((__no_unique_address__))` in C++03 mode. (Neither does Clang. What is up with that!) Your suggested trait implementation is slightly wrong for `is_final` types: you return 0 but really a final type _can_ have usable tail padding. See https://godbolt.org/z/P6x459MEq
(In reply to Arthur O'Dwyer from comment #23) > - There may be padding in the middle of an object, and I'm not confident > that the Standard actually forbids it from being used. Of course your > approach works fine on the Itanium ABI, and probably works everywhere that > actually exists. If you've got chapter+verse proving that it must work > *everywhere*, I'd appreciate seeing it, just for my own information. I don't care about everywhere, only GCC-like compilers on the targets they support. > - If GCC were ever to add a builtin for this notion, IMO the proper name > would be `__datasizeof(T)`. See > https://danakj.github.io/2023/01/15/trivially-relocatable.html#data-size I think it should have a __builtin_ prefix unless it directly implements a standard trait like __is_object. > - You can implement your library trait like this; see > https://quuxplusone.github.io/blog/2023/03/04/trivial-swap-prize-update/ > #step-1.-std-swap-can-t-assume-it > > template <class _Tp> > struct __libcpp_datasizeof { > struct _Up { > [[__no_unique_address__]] _Tp __t_; > char __c_; > }; > static const size_t value = __builtin_offsetof(_Up, __c_); > }; > > Unfortunately it looks like GCC doesn't support > `__attribute__((__no_unique_address__))` in C++03 mode. (Neither does Clang. > What is up with that!) Which is OK, because you can't have final types in C++03 either, so testing it using inheritance is good enough for C++03. > Your suggested trait implementation is slightly wrong for `is_final` types: > you return 0 but really a final type _can_ have usable tail padding. See > https://godbolt.org/z/P6x459MEq Ah, I didn't think that padding would actually be used by an adjacent member subobject. But of course it can be for an empty class, where it consists of a single byte of padding which can be reused by another member of a different type.
... and non-empty types, I'm just saying that's the obvious case that proves it's possible.