std::vector move assign patch

Tue Apr 25 15:35:00 GMT 2017

On 25/04/17 17:23 +0200, Marc Glisse wrote:
>On Tue, 25 Apr 2017, Jonathan Wakely wrote:
>
>>On 24/04/17 22:10 +0200, Marc Glisse wrote:
>>>It seems that this patch had 2 consequences that may or may not 
>>>have been planned. Consider this example (from PR64601)
>>>
>>>#include <vector>
>>>typedef std::vector<int> V;
>>>void f(V&v,V&w){ V(std::move(w)).swap(v); }
>>>void g(V&v,V&w){ v=std::move(w); }
>>>
>>>1) We generate shorter code for f than for g, probably since the 
>>>fix for PR59738. g ends up zeroing v, copying w to v, and finally 
>>>zeroing w, and for weird reasons (and because we swap the members 
>>>one by one) the standard prevents us from assuming that v and w do 
>>>not overlap in weird ways so we cannot optimize as much as one 
>>>might expect.
>>
>>f has an additional precondition (that the allocators of the vectors
>>being swapped must propagate on swap or be equal) and so the swap code
>>doesn't have to worry about non-equal allocators.
>>
>>g has to be able to cope with the case where the allocator doesn't
>>propagate and isn't equal, and so is more complicated.
>>
>>However, the propagation trait is known at compile-time, and for the
>>common case so is the equality condition, so it's unfortunate if that
>>can't be simplified (I'm sure you've analysed it carefully already
>>though!)
>
>The code isn't horrible. With f, we get:
>
>        movq    (%rsi), %r8
>        movq    8(%rsi), %rcx
>        movq    $0, (%rsi)
>        movq    $0, 8(%rsi)
>        movq    16(%rsi), %rdx
>        movq    $0, 16(%rsi)
>        movq    (%rdi), %rax
>        movq    %rcx, 8(%rdi)
>        movq    %r8, (%rdi)
>        movq    %rdx, 16(%rdi)
>        testq   %rax, %rax
>
>which seems quite optimal: read each pointer from w, write them to v, 
>write 0s in w, that's 9 memory operations, +1 to read the pointer from 
>w and possibly call delete on it.
>
>With g:
>
>        movq    $0, 8(%rdi)
>        movq    (%rdi), %rax
>        movq    $0, 16(%rdi)
>        movq    $0, (%rdi)
>        movq    (%rsi), %rdx
>        movq    %rdx, (%rdi)
>        movq    8(%rsi), %rcx
>        movq    $0, (%rsi)
>        movq    8(%rdi), %rdx
>        movq    %rcx, 8(%rdi)
>        movq    16(%rsi), %rcx
>        movq    %rdx, 8(%rsi)
>        movq    16(%rdi), %rdx
>        movq    %rcx, 16(%rdi)
>        movq    %rdx, 16(%rsi)
>        testq   %rax, %rax
>
>That's only 5 more memory operations. If I tweak vector swapping to 
>avoid calling swap on each member (which drops type-based aliasing 
>information, that was the topic of PR64601)

I didn't really understand the discussion in the PR. I find that's
true of most TBAA discussions.

std::swap(T& x, T& y) is hard to optimise because we don't know that
the dynamic type of the thing at &x is the same type as T?

>        void _M_swap_data(_Vector_impl& __x) _GLIBCXX_NOEXCEPT
>        {
>          pointer tmp;
>#define MARC(x,y) tmp=x; x=y; y=tmp
>          MARC(_M_start, __x._M_start);
>          MARC(_M_finish, __x._M_finish);
>          MARC(_M_end_of_storage, __x._M_end_of_storage);
>        }
>
>this gets down to 13, which is kind of sensible
>* 0 the elements of v -> 3 ops
>* read the elements of w -> 3 ops
>* write them to v -> 3 ops
>* 0 the elements of w -> 3 ops
>(+1 to get the pointer that we might call delete on)
>
>The first step of zeroing the elements of v is redundant
>* if v and w don't alias, we are going to overwrite those 0s in step 3 
>without ever reading them
>* if v and w are the same, we are going to write those 0s in step 4 anyway
>
>but that's hard for the optimizers to notice.
>
>I didn't try hard to find a nice C++ way to get an equivalent of g 
>that generates the optimal number of operations, but it would be a 
>little ugly to write in operator=
>this->_M_impl._M_finish = x._M_impl._M_finish; x._M_impl._M_finish = 0;
>same for _M_end_of_storage and _M_start, and remembering to use the 
>original this->_M_impl._M_start for delete.

I'm not opposed to writing it out by hand. Operations on std::vector
should be as fast as possible, and move-assignment should be cheap.

>>>2) g(v,v) seems to turn v into a nice empty vector,
>>
>>Yes.
>>
>>>while f(v,v) turns it into an invalid vector pointing at released memory.
>>
>>Does it?! I don't see that happening, and it's a bug if it does.
>
>Er, my fault, you are right. It shuffles things around and amounts to 
>a NOP, v remains as it was before the call to f. Which could be 
>considered less desirable than clearing the vector by some, but not as 
>much as getting something invalid of course :-)

Phew :-)