Bug 87106

Summary:	Group move and destruction of the source, where possible, for speed
Product:	gcc	Reporter:	Marc Glisse <glisse>
Component:	libstdc++	Assignee:	Not yet assigned to anyone <unassigned>
Status:	UNCONFIRMED ---
Severity:	enhancement	CC:	arthur.j.odwyer, webrown.cpp
Priority:	P3	Keywords:	missed-optimization
Version:	9.0
Target Milestone:	---
Host:		Target:
Build:		Known to work:
Known to fail:		Last reconfirmed:
Attachments:	proof of concept patch (diff -w) proof of concept patch v2 (diff -w)

Description Marc Glisse 2018-08-26 10:23:08 UTC

Just a random testcase so I can give numbers, I don't claim this is a good testcase at all

#include <string>
#include <vector>

__attribute__((flatten))
void f(){
  int n = 1024*1024;
  std::vector<std::string> v(n);
  v.resize(n+1);
}
int main(){
  for(int i=0;i<256;++i) f();
}

runs in about 2.4s now. In _M_default_append, we have a first loop that copies (moves) strings from old to new, and a second loop that destroys old. If I comment out the destroying loop (not something we should do in general, this is just for the numbers), the running time goes down to 2.0s. If I replace the 2 loops with a single loop that does both move and destroy, the running time is now 1.6s. Move+destroy (aka destructive move, relocation, etc) are 2 operations that go well together and are not unlikely to simplify. Ideally the compiler would merge the 2 loops (loop fusion) for us, but it doesn't. Doing the operations in this order is only valid here because std::string can be moved+destroyed nothrow.

I think it would be nice to introduce a special case for nothrow-relocatable types in several functions for std::vector (_M_default_append is just one among several, and probably not the most important one). If that makes the code simpler, we could use if constexpr and limit the optimization to recent standards. If one of the relocation papers ever makes it through the committee, it will likely require this optimization (or at least make it an important QoI point).

There are probably places outside of vector that could also benefit, but vector looks like a good starting point.

Comment 1 Marc Glisse 2018-08-31 14:24:27 UTC

Created attachment 44638 [details]
proof of concept patch (diff -w)

Trying to get an idea of how things could look like. I know is_trivially_move_constructible is not officially the right test to be allowed to use memmove, that's a detail that can easily be changed.

Comment 2 Marc Glisse 2018-08-31 22:32:43 UTC

Created attachment 44641 [details]
proof of concept patch v2 (diff -w)

Updated to also handle push_back.

I moved _GLIBCXX_ASAN_ANNOTATE_REINIT after std::_Destroy because that's more convenient and it makes sense to me, but I am not familiar with those annotations...

Comment 3 Marc Glisse 2018-10-25 13:03:46 UTC

Author: glisse
Date: Thu Oct 25 13:03:13 2018
New Revision: 265485

URL: https://gcc.gnu.org/viewcvs?rev=265485&root=gcc&view=rev
Log:
Relocation (= move+destroy)

2018-10-25  Marc Glisse  <marc.glisse@inria.fr>

	PR libstdc++/87106
	* include/bits/alloc_traits.h (_S_construct, _S_destroy, construct,
	destroy): Add noexcept specification.
	* include/bits/allocator.h (construct, destroy): Likewise.
	* include/ext/alloc_traits.h (construct, destroy): Likewise.
	* include/ext/malloc_allocator.h (construct, destroy): Likewise.
	* include/ext/new_allocator.h (construct, destroy): Likewise.
	* include/bits/stl_uninitialized.h (__relocate_object_a, __relocate_a,
	__relocate_a_1): New functions.
	(__is_trivially_relocatable): New class.
	* include/bits/stl_vector.h (__use_relocate): New static member.
	* include/bits/vector.tcc (reserve, _M_realloc_insert,
	_M_default_append): Use __relocate_a.
	(reserve, _M_assign_aux, _M_realloc_insert, _M_fill_insert,
	_M_default_append, _M_range_insert): Move _GLIBCXX_ASAN_ANNOTATE_REINIT
	after _Destroy.
	* testsuite/23_containers/vector/modifiers/push_back/49836.cc:
	Replace CopyConsOnlyType with DelAnyAssign.


Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/alloc_traits.h
    trunk/libstdc++-v3/include/bits/allocator.h
    trunk/libstdc++-v3/include/bits/stl_uninitialized.h
    trunk/libstdc++-v3/include/bits/stl_vector.h
    trunk/libstdc++-v3/include/bits/vector.tcc
    trunk/libstdc++-v3/include/ext/alloc_traits.h
    trunk/libstdc++-v3/include/ext/malloc_allocator.h
    trunk/libstdc++-v3/include/ext/new_allocator.h
    trunk/libstdc++-v3/testsuite/23_containers/vector/modifiers/push_back/49836.cc

Comment 4 Arthur O'Dwyer 2018-10-25 14:47:07 UTC

Hi, I'm the author of P1144 "Object relocation in terms of move plus destroy".
Awesome work you've done here!
Have you seen my libc++ patch on the same topic as yours?

https://quuxplusone.github.io/blog/2018/07/18/announcing-trivially-relocatable/
https://github.com/Quuxplusone/libcxx/tree/trivially-relocatable/include

Specifically, the piece that I think is still missing from libstdc++'s implementation (besides all the cool stuff that you'd get from the attribute) is the trait `__has_trivial_construct<A, T, T&&>`. This trait allows you to write your `__relocate_a_1` generically, instead of special-casing it for `std::allocator<T>` in particular. (So for example it could also get picked up for `std::pmr::polymorphic_allocator<T>`... even though I guess I didn't implement that part for my libc++ patch, because I'm still waiting on <memory_resource> to get merged.)

Here's my `__has_trivial_construct`:
https://github.com/Quuxplusone/libcxx/blob/99d734b4/include/memory#L5790-L5800

And here's how std::vector uses it (which is significantly different from how libstdc++'s std::vector is arranged, but I'm sure the trait would be the same):
https://github.com/Quuxplusone/libcxx/blob/99d734b4/include/vector#L587-L600

I hope we get P1144 so that you don't have to waste time and brain cells specializing `__is_trivially_relocatable<T>` for std::string and std::pair and so on.

Complete tangent: I'm confused how vector.tcc is allowed to use "if constexpr" in C++11 mode. https://github.com/gcc-mirror/gcc/commit/e9f84d4c#diff-05b068171cedf9d0176bada75d7dd112R76

Comment 5 Marc Glisse 2018-10-25 16:37:07 UTC

(In reply to Arthur O'Dwyer from comment #4)
> Have you seen my libc++ patch on the same topic as yours?
> 
> https://quuxplusone.github.io/blog/2018/07/18/announcing-trivially-
> relocatable/
> https://github.com/Quuxplusone/libcxx/tree/trivially-relocatable/include

I am not sure I should look too closely for copyright reasons. Maybe I can... I did read the papers though.

> Specifically, the piece that I think is still missing from libstdc++'s
> implementation (besides all the cool stuff that you'd get from the
> attribute) is the trait `__has_trivial_construct<A, T, T&&>`.

I already filed Bug 87604 for that.

> This trait
> allows you to write your `__relocate_a_1` generically, instead of
> special-casing it for `std::allocator<T>` in particular. (So for example it
> could also get picked up for `std::pmr::polymorphic_allocator<T>`...

I am not particularly interested in polymorphic_allocator. The patch is still missing more important pieces, like the ability to actually specialize __is_trivially_relocatable (I already have another patch locally with the missing piece).

> I hope we get P1144 so that you don't have to waste time and brain cells
> specializing `__is_trivially_relocatable<T>` for std::string and std::pair
> and so on.

std::string is not trivially relocatable in libstdc++, so I won't waste any time there. IIRC, one difference with what you did is that I use relocation even for some types that are not trivially relocatable (in particular std::string), because it actually helps, as shown in the first message of this bug report.

> Complete tangent: I'm confused how vector.tcc is allowed to use "if
> constexpr" in C++11 mode.
> https://github.com/gcc-mirror/gcc/commit/e9f84d4c#diff-
> 05b068171cedf9d0176bada75d7dd112R76

It is enabled as an extension and produces a warning which is disabled in system headers.

Comment 6 Marc Glisse 2018-10-25 18:32:14 UTC

Re-reading P1144R0 (those are not necessarily comments on the paper, just what comes to mind for gcc):
1) yes, an automatic detection mechanism would be nice, and an attribute makes sense.
2) the conditionally trivial stuff is not very convenient, it seems to involve a lot of code duplication. People regularly suggest attributes of the form [[trivially_relocatable(condition)]] which might reduce the noise but are harder to specify.
3) there are indeed many places outside of vector that can benefit from relocation. The compromises may not be the same though. For my use in vectors, relocation was never worse, it performed the same moves and destructions as the original. In other places, using relocation would replace a move-assignment with a construction and a destruction, which could be worse, so we may want to limit it to the trivially relocatable case.
4) I think it might be useful to be able to specify how to relocate an object that is not trivially relocatable. Relocating a pair<A,B> where A is trivially relocatable and B is not can still benefit from doing piecewise relocations so it avoids A's super-costly move constructor. Ideally relocation would be a constructor or something similarly magic and the compiler would auto-generate it for aggregates, etc. But I am asking a bit much there...

By the way, thanks for keeping destructive moves alive :-)

Comment 7 Arthur O'Dwyer 2018-10-26 04:04:45 UTC

> std::string is not trivially relocatable in libstdc++

This is surprising news to me! Just goes to show that we would benefit from an accurate detection mechanism and type trait. :)

> so I won't waste any time there. IIRC, one difference with what you did is that I use relocation even for some types that are not trivially relocatable (in particular std::string)

...but which are is_nothrow_relocatable. Right. This is a good point in the direction of maybe keeping is_nothrow_relocatable in P1144. I was thinking that maybe it was pointless and should be removed until a use-case is found for it. Now I'm even more unsure.


> 2) the conditionally trivial stuff is not very convenient, it seems to involve a lot of code duplication. People regularly suggest attributes of the form [[trivially_relocatable(condition)]] which might reduce the noise but are harder to specify.

Agreed on all counts. Particularly for this attribute, I worry that the common use-case would be something really ugly like

    template<class T, class A>
    class [[trivially_relocatable(
        is_trivially_relocatable_v<A> &&
        is_trivially_relocatable_v<typename allocator_traits<A>::pointer>
    )]] vector {
    };

which I hope you'd agree would be even more ridiculous than the current metaprogramming/duplication. And then on top of that, parsing an attribute-list would become as hard as parsing all of C++, which I don't think we want.

> 4) ... Relocating a pair<A,B> where A is trivially relocatable and B is not can still benefit from doing piecewise relocations so it avoids A's super-costly move constructor.

Hmm, I see what you mean, and that is a relevant and novel point AFAIC. The judo dodge for P1144 here is to point out that if today we provide a way to warrant/detect *trivial* relocation, well, that is not incompatible with providing a way to *customize* relocation tomorrow. We just need to agree that relocation ought to be "tantamount to" a move plus a destroy.

> Ideally relocation would be a constructor or something similarly magic and the compiler would auto-generate it for aggregates, etc. But I am asking a bit much there...

Yes, Denis Bider's P0023 "Relocator"
http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0023r0.pdf
went that route, and I think it was not well received -- or possibly just not discussed at all, I'm not sure.

Thanks for the comments and the nice work!

Comment 8 Marc Glisse 2018-10-26 07:18:45 UTC

(In reply to Arthur O'Dwyer from comment #7)
> 2) the conditionally trivial stuff is not very convenient, it seems to involve a lot of code duplication. People regularly suggest attributes of the form [[trivially_relocatable(condition)]] which might reduce the noise but are harder to specify.
> 
> Agreed on all counts. Particularly for this attribute, I worry that the
> common use-case would be something really ugly like
> 
>     template<class T, class A>
>     class [[trivially_relocatable(
>         is_trivially_relocatable_v<A> &&
>         is_trivially_relocatable_v<typename allocator_traits<A>::pointer>
>     )]] vector {
>     };
> 
> which I hope you'd agree would be even more ridiculous than the current
> metaprogramming/duplication.

We already have a number of similar constructs, like noexcept. Still better than duplicating the whole class IMO.

> And then on top of that, parsing an
> attribute-list would become as hard as parsing all of C++, which I don't
> think we want.

The code is already there in the parser, calling it should not be too hard. However, we end up with questions like: when is the expression evaluated? Instantiating some member classes too eagerly could break some code.

Comment 9 Marc Glisse 2018-10-26 22:02:47 UTC

I was looking into using relocation in std::swap. For a type like deque (if we ignore the swap overload), using memcpy is really worth it. For a more simple type like int, using memcpy loses type information, which can hurt other optimizations, so I'd rather stick to the typed assignments for swap<int>. It isn't obvious what the right compromise is. Maybe types that are trivially relocatable but not trivially movable and destructible? Or just non-trivial? That might work if libstdc++ wasn't stuck with a legacy, non-trivial pair<int,int>.

For the array swap, I could use a large intermediate buffer on the side (memcpy really shines on large regions), possibly as large as the full array, but I would need to cap it to some safe size (hard to define).

By the way, when swapping ranges of a type that does not specialize/overload swap, it seems wasteful to call swap on each element, because it creates a new temporary object for each element instead of reusing the same one. It may be possible to detect if swap would call an overload in a different namespace, but it seems harder to detect if std::swap was specialized).

Comment 10 Arthur O'Dwyer 2018-10-27 02:30:01 UTC

> Still better than duplicating the whole class IMO.

The `optional` example in P1144R0 Appendix B looks scarier than I should have. For one thing, I omitted all the boring user-facing API of `optional` (emplace, reset, has_value, operator bool, operator*, operator->, ...) which would be implemented in the top level and thus shared by everyone. For another thing, I showed the metaprogramming for BOTH trivially relocatable AND trivially constructible/destructible in the same example. Adding trivial relocatability to a library type doesn't involve any duplication of member function implementations. It's tedious for sure, but mechanical.

Here's the commit where I trivially-relocatable-ized `std::deque`. 34 lines added, 18 more lines changed. There's some duplication, but it's not literally "duplicating the whole class."
https://github.com/llvm-mirror/libcxx/commit/cccc6d330fba37852455a3905b402b1aabe05474#diff-38adc80cec663f2f29c22e9ffc0de912


> I was looking into using relocation in std::swap. ... Maybe types that are trivially relocatable but not trivially movable and destructible? Or just non-trivial?

My commit for `std::swap` is here:
https://github.com/Quuxplusone/libcxx/commit/4d89aa95a7da86d1671d3e4441967782399fc3f9#diff-d436eb0fd9de10b54a828ce6435f7e81
It's extremely naive. I just say, if the type is_trivially_relocatable, then we use memcpy --- EXCEPT, if the type is_empty, then we should NOT use memcpy, because that is highly likely to break real code.
https://quuxplusone.github.io/blog/2018/07/13/trivially-copyable-corner-cases/

> For the array swap, I could use a large intermediate buffer on the side (memcpy really shines on large regions), possibly as large as the full array, but I would need to cap it to some safe size (hard to define).

A couple of people have mentioned to me that libc is conspicuously missing a `memswap(void *a, void *b, size_t n)` primitive (such as would be used internally by `qsort`). If libc guaranteed us an efficient implementation of `memswap`, we wouldn't need to do so much micro-optimization at the C++ level.

> By the way, when swapping ranges of a type that does not specialize/overload swap, it seems wasteful to call swap on each element, because it creates a new temporary object for each element instead of reusing the same one.

I'm not sure I follow. Are you relying on the premise that move-constructing an object, and then destructing a moved-from object, is more expensive than move-assigning into a moved-from object? I would expect move-assigning to be *no cheaper* than move-constructing, and so constructing and destructing a new temporary for each element ought to be *no more expensive* (and possibly cheaper) than reusing the same one.
But I have no actual benchmark numbers. Do you?

Re detecting overloads/specializations of std::swap, I think both are impossible, but I'm curious what you're thinking of, BUT I bet it's off-topic for this issue, so maybe you could email me your idea? :)

Comment 11 Marc Glisse 2018-10-27 08:58:02 UTC

(In reply to Arthur O'Dwyer from comment #10)
> Here's the commit where I trivially-relocatable-ized `std::deque`. 34 lines
> added, 18 more lines changed. There's some duplication, but it's not
> literally "duplicating the whole class."

Indeed. It requires defaulting some methods in the outer class. And it requires some artificial base classes, something I've been fighting against with [[unique_address]]. But it looks doable...

> > I was looking into using relocation in std::swap. ... Maybe types that are trivially relocatable but not trivially movable and destructible? Or just non-trivial?
> 
> My commit for `std::swap` is here:
> https://github.com/Quuxplusone/libcxx/commit/
> 4d89aa95a7da86d1671d3e4441967782399fc3f9#diff-
> d436eb0fd9de10b54a828ce6435f7e81
> It's extremely naive. I just say, if the type is_trivially_relocatable, then
> we use memcpy --- EXCEPT, if the type is_empty, then we should NOT use
> memcpy, because that is highly likely to break real code.
> https://quuxplusone.github.io/blog/2018/07/13/trivially-copyable-corner-
> cases/

Those are real pre-existing problems. As long as we deal with vector<T>, we know that T is the real type so we are safe. But indeed for one-element functions like swap it is more dangerous.

I was considering not just cases that are dangerous, but also cases (like int) that are safe but where memcpy might hinder some optimizations using tbaa.

> I'm not sure I follow. Are you relying on the premise that move-constructing
> an object, and then destructing a moved-from object, is more expensive than
> move-assigning into a moved-from object? I would expect move-assigning to be
> *no cheaper* than move-constructing, and so constructing and destructing a
> new temporary for each element ought to be *no more expensive* (and possibly
> cheaper) than reusing the same one.

For a type that always has some memory allocated, say std::list in visual studio, constructing+destructing means allocating and deallocating a sentinel, which can be skipped when assigning.

Comment 12 Marc Glisse 2018-11-22 18:10:37 UTC

Author: glisse
Date: Thu Nov 22 18:10:05 2018
New Revision: 266386

URL: https://gcc.gnu.org/viewcvs?rev=266386&root=gcc&view=rev
Log:
Improve relocation

2018-11-22  Marc Glisse  <marc.glisse@inria.fr>

	PR libstdc++/87106
	* include/bits/stl_algobase.h: Include <type_traits>.
	(__niter_base): Add noexcept specification.
	* include/bits/stl_deque.h: Include <bits/stl_uninitialized.h>.
	(__is_trivially_relocatable): Specialize for deque.
	* include/bits/stl_iterator.h: Include <type_traits>.
	(__niter_base): Add noexcept specification.
	* include/bits/stl_uninitialized.h (__is_trivially_relocatable):
	Add parameter for meta-programming.
	(__relocate_a_1, __relocate_a): Add noexcept specification.
	* include/bits/stl_vector.h (__use_relocate): Test __relocate_a.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/stl_algobase.h
    trunk/libstdc++-v3/include/bits/stl_deque.h
    trunk/libstdc++-v3/include/bits/stl_iterator.h
    trunk/libstdc++-v3/include/bits/stl_uninitialized.h
    trunk/libstdc++-v3/include/bits/stl_vector.h

Comment 13 Arthur O'Dwyer 2018-11-23 21:34:32 UTC

Re https://gcc.gnu.org/viewcvs?rev=266386&root=gcc&view=rev — awesome, but I'm curious: Why `deque<T>` before `vector<T>`?
I mean are you planning eventually to add the specializations of `__is_trivially_relocatable` for `vector`, `forward_list`, `reference_wrapper`, `optional`, `pair`, etc. etc., and you just decided to start with `deque`? or was there a specific motivating case?

Comment 14 Marc Glisse 2018-11-23 22:11:43 UTC

(In reply to Arthur O'Dwyer from comment #13)
> Re https://gcc.gnu.org/viewcvs?rev=266386&root=gcc&view=rev — awesome, but
> I'm curious: Why `deque<T>` before `vector<T>`?
> I mean are you planning eventually to add the specializations of
> `__is_trivially_relocatable` for `vector`, `forward_list`,
> `reference_wrapper`, `optional`, `pair`, etc. etc., and you just decided to
> start with `deque`? or was there a specific motivating case?

Because deque is not noexcept-move-constructible. This means that by default it cannot execute the destructors right after the move constructors, so it does not have a chance to remove the unnecessary allocation. We really need to help it at the library level, and the potential gain is significant.
For vector, the compiler sees a loop copying contiguous data from one place to another, so it should be able to produce a call to memcpy without help: PR 86024. There may be complications related to aliasing, I haven't looked at it yet. But I believe we should first explore making the compiler more clever, which would apply to a lot of code, before optimizing a special case in one library (which we could still do as a stopgap).
I expect we may add some specializations for wrapper types like pair/tuple/optional/etc because we don't have compiler support to automatically handle them (like your attribute), although I am not sure how far we should go (without compiler support). Specializing for one type was a bit of a proof of concept, checking that the code was ready for it, it isn't a promise to specialize a lot more. I do still hope some version of destructive move will eventually be standardized and resolve the issue.

This is all just my opinion, it may change, and others (in particular the maintainers) may have a different opinion.

By the way, there won't be anything new for several months, this is gcc's hibernation season where it tries to properly digest all it has eaten during the summer.

Comment 15 Arthur O'Dwyer 2019-01-18 21:33:41 UTC

@Marc, it only now occurs to me that if libstdc++ uses `__is_trivially_relocatable` as its userspace type-trait name, then GCC won't be able to use `__is_trivially_relocatable(T)` as the name of its compiler builtin. (It would be analogous to if libstdc++ had used `__is_trivially_copyable` as the name of a type-trait.)
Is there any chance that you could rename `__is_trivially_relocatable` to `__is_trivially_relocatablex` or `__has_trivial_relocatability` or something like that? Is it too late? I hope not.

Comment 16 Marc Glisse 2019-01-18 22:05:00 UTC

(In reply to Arthur O'Dwyer from comment #15)
> @Marc, it only now occurs to me that if libstdc++ uses
> `__is_trivially_relocatable` as its userspace type-trait name, then GCC
> won't be able to use `__is_trivially_relocatable(T)` as the name of its
> compiler builtin. (It would be analogous to if libstdc++ had used
> `__is_trivially_copyable` as the name of a type-trait.)
> Is there any chance that you could rename `__is_trivially_relocatable` to
> `__is_trivially_relocatablex` or `__has_trivial_relocatability` or something
> like that? Is it too late? I hope not.

The name __is_trivially_relocatable is currently undocumented, it is an implementation detail and users should not specialize it (or at least they should be ready for it to break very easily). It can be renamed at any point if convenient, say if/when gcc implements a builtin with the same name. Or is the issue that clang already has such a builtin, and it now conflicts with libstdc++? In that case I agree we should rename it. I don't care much about the name, see what the maintainers are happy with...

Comment 17 Arthur O'Dwyer 2019-01-18 22:14:17 UTC

(In reply to Marc Glisse from comment #16)
> (In reply to Arthur O'Dwyer from comment #15)
> > @Marc, it only now occurs to me that if libstdc++ uses
> > `__is_trivially_relocatable` as its userspace type-trait name, then GCC
> > won't be able to use `__is_trivially_relocatable(T)` as the name of its
> > compiler builtin. (It would be analogous to if libstdc++ had used
> > `__is_trivially_copyable` as the name of a type-trait.)
> > Is there any chance that you could rename `__is_trivially_relocatable` to
> > `__is_trivially_relocatablex` or `__has_trivial_relocatability` or something
> > like that? Is it too late? I hope not.
> 
> The name __is_trivially_relocatable is currently undocumented, it is an
> implementation detail and users should not specialize it (or at least they
> should be ready for it to break very easily). It can be renamed at any point
> if convenient, say if/when gcc implements a builtin with the same name. Or
> is the issue that clang already has such a builtin, and it now conflicts
> with libstdc++? In that case I agree we should rename it. I don't care much
> about the name, see what the maintainers are happy with...

Agreed users should be ready for the library trait to break.

Clang doesn't currently have such a builtin, but I am proposing it in https://reviews.llvm.org/D50119 (and I would rather rename the libstdc++ detail than the proposed Clang builtin).

My concern is that if GCC/Clang EVER implement such a builtin, then that (newer) version of GCC/Clang would forever be unable to build this (older) version of libstdc++. I think that's not the end of the world, but I think we ought to preemptively avoid the situation if it's cheap to do so. I should have brought this up earlier, but as I said, it just dawned on me today.

Comment 18 Marc Glisse 2019-02-05 09:34:08 UTC

Author: glisse
Date: Tue Feb  5 09:33:36 2019
New Revision: 268532

URL: https://gcc.gnu.org/viewcvs?rev=268532&root=gcc&view=rev
Log:
Rename __is_trivially_relocatable to __is_bitwise_relocatable.

2019-02-05  Marc Glisse  <marc.glisse@inria.fr>

	PR libstdc++/87106
	* include/bits/stl_uninitialized.h (__is_trivially_relocatable):
	Rename...
	(__is_bitwise_relocatable): ... to this.
	(__relocate_a_1): Adapt.
	* include/bits/stl_deque.h (__is_trivially_relocatable): Rename...
	(__is_bitwise_relocatable): ... to this.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/stl_deque.h
    trunk/libstdc++-v3/include/bits/stl_uninitialized.h

Comment 19 Arthur O'Dwyer 2019-02-05 20:25:12 UTC

Awesome! Thank you! :)

Comment 20 Marc Glisse 2019-02-12 19:05:36 UTC

TODO: specify __restrict on the arguments of __relocate_object_a. Since one is initialized and not the other, it should be safe to assume that they are disjoint, and it helps optimize move+destroy, for instance for unique_ptr<int>.

Comment 21 Marc Glisse 2019-04-27 14:09:51 UTC

Author: glisse
Date: Sat Apr 27 14:09:20 2019
New Revision: 270624

URL: https://gcc.gnu.org/viewcvs?rev=270624&root=gcc&view=rev
Log:
Use __restrict for __relocate_object_a

2019-04-27  Marc Glisse  <marc.glisse@inria.fr>

	PR libstdc++/87106
	* include/bits/stl_uninitialized.h (__relocate_object_a): Mark the
	arguments with __restrict.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/stl_uninitialized.h

Comment 22 Dan Stahlke 2020-01-16 02:19:47 UTC

Any reason why __is_bitwise_relocatable = is_trivial, instead of is_trivially_copyable?  cppreference tells me the difference between these two is that the first needs a trivial default constructor.  But is the default constructor relevant here?  I could see the DEstructor being relevant.

Comment 23 Arthur O'Dwyer 2020-01-16 02:42:29 UTC

@Dan Stahlke: I believe https://stackoverflow.com/questions/47464819/uninitialized-copy-memcpy-memmove-optimization answers your question. Or, if it doesn't, then Marc or someone should consider posting an answer that's better than mine... and tell me whether I should delete my answer. :)

Comment 24 Marc Glisse 2020-01-16 07:09:34 UTC

Something like that, yes. Essentially, I used trivial because I was convinced it was safe in that case, not because it looked like the perfect condition. If someone can convincingly argue for a weaker condition based on standard quotes (you need to convince Jonathan, not me), that would be nice.

Comment 25 Jonathan Wakely 2020-01-16 09:58:54 UTC

I think that's the answer, yes (although I haven't refreshed my memory of the issues completely). A call to memmove does not begin the lifetime of any objects at the destination address.

For types that are trivially-copyable but not trivial we could consider doing a first pass over the range to default-initialize the destination objects, then do the memmove to overwrite them with the desired values. In theory, the optimizer would realize the default-initializations are dead stores and completely remove that first loop (but only after the semantic checks have concluded that we are obeying the languages rules and have no undefined behaviour).

So change the bitwise_relocatable condition to is_trivially_copyable && is_default_constructible and make __relocate_a_1 do something like this (completely untested, typing as I think about it):

  template<typename _Tp, typename = void>
    struct __is_bitwise_relocatable
    : __and_<is_trivially_copyable<_Tp>, is_default_constructible<_Tp>> { };

  template <typename _Tp, typename _Up>
    inline __enable_if_t<std::__is_bitwise_relocatable<_Tp>::value, _Tp*>
    __relocate_a_1(_Tp* __first, _Tp* __last,
		   _Tp* __result, allocator<_Up>& a) noexcept
    {
      ptrdiff_t __count = __last - __first;
      if (__count > 0)
        {
          if constexpr (!is_trivial_v<T>)
            {
#ifdef __OPTIMIZE__
              // begin lifetime of destination objects
              std::__uninitialized_default_novalue_n(result, result + count);
#else
              // can't do bitwise relocation today
              for (ptrdiff_t i = 0; i < count; ++i)
                __relocate_object_a(result + i, first + i, a);
              return result + count;
#endif         
            }
	  __builtin_memmove(__result, __first, __count * sizeof(_Tp));
        }
      return __result + __count;
    }

Comment 26 Jonathan Wakely 2020-01-16 10:02:26 UTC

Or maybe that should be:

  template<typename _Tp, typename = void>
    struct __is_bitwise_relocatable
    : is_trivially_copyable<_Tp> { };

...

          if constexpr (!is_trivial_v<T>)
            if (is_default_constructible_v<_Tp> && (__OPTIMIZE__+0))
              std::__uninitialized_default_novalue_n(result, result + count);
            else
              {
                // can't do bitwise relocation today
                ...
              }

Comment 27 Jonathan Wakely 2020-01-16 10:03:54 UTC

>           if constexpr (!is_trivial_v<T>)
>             if (is_default_constructible_v<_Tp> && (__OPTIMIZE__+0))

That would need to be 'if constexpr'

I'll stop now.

Comment 28 Dan Stahlke 2020-01-16 19:06:09 UTC

Thank you.  That makes sense.  I had asked about it here: https://stackoverflow.com/questions/59690019  I was directed to this thread, and linked back to the SO thread you provided.

Comment 29 Marc Glisse 2020-01-16 19:37:53 UTC

Note that __is_bitwise_relocatable is specialized to true for deque, so we are not super consistent here ;-)
The original patch used is_trivially_move_constructible, IIRC I changed it to is_trivial so the review wouldn't turn into a lengthy discussion on the subject, but I don't think going back to is_trivially_move_constructible would cause miscompilations (maybe warnings with -Wsystem-headers).