Review Hashtable extract node API
François Dumont
frs.dumont@gmail.com
Fri Jun 7 16:39:00 GMT 2019
On 6/5/19 6:22 PM, Jonathan Wakely wrote:
> On 04/06/19 19:19 +0200, François Dumont wrote:
>> Hi
>>
>> Â Â Â Here is a patch to enhance the _Hashtable extract node API and
>> fix a FIXME request.
>>
>> Â Â Â The enhancement to the extract node Api is that extract(const
>> key_type&) do not call extract(const_iterator) anymore. Doing so we
>> had to loop again through bucket nodes to find the previous node to
>> the one to extract. Even if a bucket shall not contain many nodes (in
>> unique key mode) it's easy to avoid it.
>
> Nice.
>
>> Â Â Â To fix the FIXME I introduced a node smart pointer type managing
>> the node lifetime. The node is extracted from this smart pointer only
>> when there can't be any exception raised. In the context of the node
>> extract api the node handle is considered as a smart pointer. So the
>> node handle will remain owner of the node in case of exception when
>> reinserting it, I hope it is the expected behavior.
>
> Yes, that's right.
>
> I was going to suggest just using the node handle type instead of
> inventing a new smart pointer, but the handle type uses std::optional
> so isn't available for C++11/14.
I considered it too, or even use shared_ptr but thought it was overkill.
>
>
>> Â Â Â * include/bits/hashtable_policy.h
>> Â Â Â (struct _NodeSmartPointer<_NodeAlloc>): New.
>> Â Â Â (_Map_base<>::operator[](const key_type&)): Use latter, adapt.
>> Â Â Â (_Map_base<>::operator[](key_type&&)): Likewise.
>> Â Â Â * include/bits/hashtable.h
>> Â Â Â (_Hashtable<>::__node_sp_t): New.
>> Â Â Â (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code,
>> Â Â Â __node_type*, size_type)): Replace by...
>> (_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&,
>> Â Â Â size_type, __hash_code, const _NodeAccessor&, size_type)): ...that.
>> Â Â Â (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code,
>> Â Â Â __node_type*)): Replace by...
>> (_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*,
>> Â Â Â __hash_code, const _NodeAccessor&)): ...that.
>> Â Â Â (_Hashtable<>::_M_reinsert_node): Adapt.
>> Â Â Â (_Hashtable<>::_M_reinsert_node_multi): Adapt.
>> Â Â Â (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New.
>> Â Â Â (_Hashtable<>::extract(const_iterator)): Use latter.
>> Â Â Â (_Hashtable<>::extract(const _Key&)): Likewise.
>> Â Â Â (_Hashtable<>::_M_merge_unique): Adapt.
>> Â Â Â (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt.
>> Â Â Â (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type,
>> Â Â Â _Args&&...)): Adapt.
>>
>> Tested under Linux x86_64.
>>
>> Ok to commit ?
>>
>> François
>>
>
>> diff --git a/libstdc++-v3/include/bits/hashtable.h
>> b/libstdc++-v3/include/bits/hashtable.h
>> index e2e3f016a35..307865b96bf 100644
>> --- a/libstdc++-v3/include/bits/hashtable.h
>> +++ b/libstdc++-v3/include/bits/hashtable.h
>> @@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> Â Â Â Â Â using __hash_cached = typename __traits_type::__hash_cached;
>> Â Â Â Â Â using __node_type = __detail::_Hash_node<_Value,
>> __hash_cached::value>;
>> Â Â Â Â Â using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>;
>> +Â Â Â Â Â using __node_sp_t =
>> __detail::_NodeSmartPointer<__node_alloc_type>;
>>
>> Â Â Â Â Â using __hashtable_alloc =
>> __detail::_Hashtable_alloc<__node_alloc_type>;
>>
>> @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> Â Â Â Â Â __node_base*
>> Â Â Â Â Â _M_get_previous_node(size_type __bkt, __node_base* __n);
>>
>> -Â Â Â Â Â // Insert node with hash code __code, in bucket bkt if no
>> rehash (assumes
>> -Â Â Â Â Â // no element with its key already present). Take ownership of
>> the node,
>> -Â Â Â Â Â // deallocate it on exception.
>> +Â Â Â Â Â // Insert node with key __k and hash code __code, in bucket
>> __bkt if no
>> +Â Â Â Â Â // rehash (assumes no element with its key already present).
>> +Â Â Â Â Â template<typename _NodeAccessor>
>> Â Â Â Â iterator
>> -Â Â Â Â Â _M_insert_unique_node(size_type __bkt, __hash_code __code,
>> -Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â __node_type* __n, size_type __n_elt = 1);
>> +Â Â Â _M_insert_unique_node(const key_type& __k, size_type __bkt,
>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â __hash_code __code, const _NodeAccessor&,
>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â size_type __n_elt = 1);
>>
>> -Â Â Â Â Â // Insert node with hash code __code. Take ownership of the node,
>> -Â Â Â Â Â // deallocate it on exception.
>> +Â Â Â Â Â // Insert node with hash code __code.
>> +Â Â Â Â Â template<typename _NodeAccessor>
>> Â Â Â Â iterator
>> -Â Â Â Â Â _M_insert_multi_node(__node_type* __hint,
>> -Â Â Â Â Â Â Â Â Â Â Â Â Â Â __hash_code __code, __node_type* __n);
>> +Â Â Â _M_insert_multi_node(__node_type* __hint, __hash_code __code,
>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â const _NodeAccessor& __node_accessor);
>
> It looks like most times you call these functions you pass an
> identical lambda expression, but each of those lambda expressions will
> create a unique type. That means you create different instantiations
> of the function templates even though they do exactly the same thing.
>
> That's just generating multiple copies of identical code. Passing in a
> function object to provide the node pointer doesn't really seem
> necessary anyway, so if it results in larger executables it's really
> not desirable.
Maybe one the reasons we have a difference when running performance
tests in -03. I don't know why I abused so much of lambdas when I see
your proposal.
>
> The attached patch still just passes in a node pointer (which the
> function takes ownership of, unless it throws). Because the semantics
> of _M_insert_multi_node change, we need the symbol name to change as
> well (so old code linking to the old symbol doesn't find the new
> definition with different semantics). Passing in the key makes it a
> different symbol (in almost all cases the caller has the key available
> already anyway).
Very nice.
>
> An alternative design would be for _M_insert_(unique|multi)_node to
> take a reference to the RAII type and set its _M_node member to null
> after taking ownership of that pointer. That would be more explicit
> about the ownership transfer, however it would require changes to the
> _M_reinsert_node and _M_reinsert_node_multi functions which don't use
> the RAII type (because the __node_type* is already owned by a node
> handle).
>
>>
>> Â Â Â Â Â template<typename... _Args>
>> Â Â Â Â std::pair<iterator, bool>
>> @@ -805,9 +807,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> Â Â Â Â Â Â Â Â Â }
>> Â Â Â Â Â Â Â else
>> Â Â Â Â Â Â Â Â Â {
>> -Â Â Â Â Â Â Â __ret.position
>> -Â Â Â Â Â Â Â Â Â = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
>> -Â Â Â Â Â Â Â __nh._M_ptr = nullptr;
>> +Â Â Â Â Â Â Â __ret.position = _M_insert_unique_node(__k, __bkt, __code,
>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â [&__nh]()
>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â { return std::exchange(__nh._M_ptr, nullptr); });
>
> The existing code here can just be changed to pass in __k, but still
> set __nh._M_ptr to null after the call returns.
>
> i.e.
>
> Â Â Â Â Â Â Â __ret.position
> Â Â Â Â Â Â Â Â Â = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr);
> Â Â Â Â Â Â Â __nh._M_ptr = nullptr;
>
>
>> Â Â Â Â Â Â Â __ret.inserted = true;
>> Â Â Â Â Â Â Â Â Â }
>> Â Â Â Â Â }
>> @@ -818,33 +820,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> Â Â Â Â Â iterator
>> Â Â Â Â Â _M_reinsert_node_multi(const_iterator __hint, node_type&& __nh)
>> Â Â Â Â Â {
>> -Â Â Â iterator __ret;
>> Â Â Â Â if (__nh.empty())
>> -Â Â Â Â Â __ret = end();
>> -Â Â Â else
>> -Â Â Â Â Â {
>> +Â Â Â Â Â return end();
>> +
>> Â Â Â Â __glibcxx_assert(get_allocator() == __nh.get_allocator());
>>
>> Â Â Â Â auto __code = this->_M_hash_code(__nh._M_key());
>> -Â Â Â Â Â Â Â auto __node = std::exchange(__nh._M_ptr, nullptr);
>> -Â Â Â Â Â Â Â // FIXME: this deallocates the node on exception.
>> -Â Â Â Â Â Â Â __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
>> -Â Â Â Â Â }
>> -Â Â Â return __ret;
>> +Â Â Â return _M_insert_multi_node(__hint._M_cur, __code,
>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â [&__nh]()
>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â { return std::exchange(__nh._M_ptr, nullptr); });
>
> Now that the call won't deallocate anything, we can fix the FIXME by
> just delaying resetting __nh._M_ptr until the call has returned:
>
> Â Â Â Â Â Â auto __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
> Â Â Â Â Â Â __nh._M_ptr = nullptr;
> Â Â Â Â Â Â return __ret;
>
>
>> Â Â Â Â Â }
>>
>> +Â Â Â private:
>> Â Â Â Â Â /// Extract a node.
>
> We don't want a Doxygen "///" comment on a private function, just "//"
> is OK. In this case I don't think we need any comment, the
> _M_extract_node name is clear enough. Saying "extract a node" adds
> nothing.
>
>
>> Â Â Â Â Â node_type
>> -Â Â Â Â Â extract(const_iterator __pos)
>> +Â Â Â Â Â _M_extract_node(size_t __bkt, __node_base* __prev_n)
>> Â Â Â Â Â {
>> -Â Â Â __node_type* __n = __pos._M_cur;
>> -Â Â Â size_t __bkt = _M_bucket_index(__n);
>> -
>> -Â Â Â // Look for previous node to unlink it from the erased one, this
>> -Â Â Â // is why we need buckets to contain the before begin to make
>> -Â Â Â // this search fast.
>> -Â Â Â __node_base* __prev_n = _M_get_previous_node(__bkt, __n);
>> -
>> +Â Â Â __node_type* __n = static_cast<__node_type*>(__prev_n->_M_nxt);
>> Â Â Â Â if (__prev_n == _M_buckets[__bkt])
>> Â Â Â Â Â _M_remove_bucket_begin(__bkt, __n->_M_next(),
>> Â Â Â Â Â Â Â Â __n->_M_nxt ? _M_bucket_index(__n->_M_next()) : 0);
>> @@ -861,14 +853,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> Â Â Â Â return { __n, this->_M_node_allocator() };
>> Â Â Â Â Â }
>>
>> +Â Â Â public:
>> +Â Â Â Â Â /// Extract a node.
>> +Â Â Â Â Â node_type
>> +Â Â Â Â Â extract(const_iterator __pos)
>> +Â Â Â Â Â {
>> +Â Â Â size_t __bkt = _M_bucket_index(__pos._M_cur);
>> +Â Â Â return _M_extract_node(__bkt, _M_get_previous_node(__bkt,
>> __pos._M_cur));
>> +Â Â Â Â Â }
>> +
>> Â Â Â Â Â /// Extract a node.
>> Â Â Â Â Â node_type
>> Â Â Â Â Â extract(const _Key& __k)
>> Â Â Â Â Â {
>> Â Â Â Â node_type __nh;
>> -Â Â Â auto __pos = find(__k);
>> -Â Â Â if (__pos != end())
>> -Â Â Â Â Â __nh = extract(const_iterator(__pos));
>> +Â Â Â __hash_code __code = this->_M_hash_code(__k);
>> +Â Â Â std::size_t __bkt = _M_bucket_index(__k, __code);
>> +Â Â Â if (__node_base* __prev_node = _M_find_before_node(__bkt, __k,
>> __code))
>> +Â Â Â Â Â __nh = _M_extract_node(__bkt, __prev_node);
>> Â Â Â Â return __nh;
>> Â Â Â Â Â }
>
> Looks good.
>
>> @@ -885,14 +887,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> Â Â Â Â Â for (auto __i = __src.begin(), __end = __src.end(); __i != __end;)
>> Â Â Â Â Â Â Â {
>> Â Â Â Â Â Â Â Â Â auto __pos = __i++;
>> -Â Â Â Â Â Â Â Â Â const key_type& __k =
>> this->_M_extract()(__pos._M_cur->_M_v());
>> +Â Â Â Â Â Â Â Â Â const key_type& __k = this->_M_extract()(*__pos);
>> Â Â Â Â Â Â Â Â Â __hash_code __code = this->_M_hash_code(__k);
>> Â Â Â Â Â Â Â Â Â size_type __bkt = _M_bucket_index(__k, __code);
>> Â Â Â Â Â Â Â Â Â if (_M_find_node(__bkt, __k, __code) == nullptr)
>> Â Â Â Â Â Â Â {
>> Â Â Â Â Â Â Â Â Â auto __nh = __src.extract(__pos);
>> -Â Â Â Â Â Â Â Â Â _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
>> -Â Â Â Â Â Â Â Â Â __nh._M_ptr = nullptr;
>> +Â Â Â Â Â Â Â Â Â _M_insert_unique_node(__k, __bkt, __code,
>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â [&__nh]()
>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â { return std::exchange(__nh._M_ptr, nullptr); },
>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â __n_elt);
>
> Again, the old code seems fine. Just pass in __k.
>
>> Â Â Â Â Â Â Â Â Â __n_elt = 1;
>> Â Â Â Â Â Â Â }
>> Â Â Â Â Â Â Â Â Â else if (__n_elt != 1)
>> @@ -1634,31 +1638,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> Â Â Â Â Â -> pair<iterator, bool>
>> Â Â Â Â Â {
>> Â Â Â Â // First build the node to get access to the hash code
>> -Â Â Â __node_type* __node
>> -Â Â Â Â Â = this->_M_allocate_node(std::forward<_Args>(__args)...);
>> -Â Â Â const key_type& __k = this->_M_extract()(__node->_M_v());
>> -Â Â Â __hash_code __code;
>> -Â Â Â __try
>> -Â Â Â Â Â {
>> -Â Â Â Â Â Â Â __code = this->_M_hash_code(__k);
>> -Â Â Â Â Â }
>> -Â Â Â __catch(...)
>> -Â Â Â Â Â {
>> -Â Â Â Â Â Â Â this->_M_deallocate_node(__node);
>
> Aside: It's confusing that _M_allocate_node actually allocates a node
> and constructs an element, and _M_deallocate_node destroys the element
> and deallocates the node (and we have _M_deallocate_node_ptr which
> just does the deallocation part). We should add comments explaining
> that.
>
>
>> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h
>> b/libstdc++-v3/include/bits/hashtable_policy.h
>> index 878154ae210..ddcff0ec9d0 100644
>> --- a/libstdc++-v3/include/bits/hashtable_policy.h
>> +++ b/libstdc++-v3/include/bits/hashtable_policy.h
>> @@ -170,6 +170,40 @@ namespace __detail
>> Â Â Â Â Â __hashtable_alloc& _M_h;
>> Â Â Â };
>>
>> +Â // Node smart pointer to benefit from RAII.
>> +Â template<typename _NodeAlloc>
>> +Â Â Â struct _NodeSmartPointer
>> +Â Â Â {
>> +Â Â Â private:
>> +Â Â Â Â Â using __hashtable_alloc = _Hashtable_alloc<_NodeAlloc>;
>> +Â Â Â Â Â using __node_type = typename __hashtable_alloc::__node_type;
>> +
>> +Â Â Â public:
>> +Â Â Â Â Â _NodeSmartPointer(__hashtable_alloc& __h, __node_type* __node)
>> noexcept
>> +Â Â Â Â Â : _M_h(__h), _M_node(__node) { }
>> +
>> +Â Â Â Â Â ~_NodeSmartPointer()
>> +Â Â Â Â Â {
>> +Â Â Â if (_M_node)
>> +Â Â Â Â Â _M_h._M_deallocate_node(_M_node);
>> +Â Â Â Â Â }
>> +
>> +Â Â Â Â Â __node_type*
>> +Â Â Â Â Â _M_get() const { return _M_node; }
>> +
>> +Â Â Â Â Â __node_type*
>> +Â Â Â Â Â _M_release() noexcept
>> +Â Â Â Â Â {
>> +Â Â Â __node_type* __tmp = _M_node;
>> +Â Â Â _M_node = nullptr;
>> +Â Â Â return __tmp;
>> +Â Â Â Â Â }
>> +
>> +Â Â Â private:
>> +Â Â Â Â Â __hashtable_alloc& _M_h;
>> +Â Â Â Â Â __node_type* _M_node;
>> +Â Â Â };
>
> This can be defined as a member of _Hashtable, and can be much
> simpler:
>
> Â Â Â Â // Simple RAII type for managing a node containing an element
> Â Â Â Â struct _Scoped_node
> Â Â Â Â {
> Â Â Â Â _Scoped_node(__hashtable_alloc* __h, __node_type* __n)
> Â Â Â Â : _M_h(__h), _M_node(__n) { }
>
> Â Â Â Â ~_Scoped_node() { if (_M_node) _M_h->_M_deallocate_node(_M_node); };
>
> Â Â Â Â _Scoped_node(const _Scoped_node&) = delete;
> Â Â Â Â _Scoped_node& operator=(const _Scoped_node&) = delete;
>
> Â Â Â Â __hashtable_alloc* _M_h;
> Â Â Â Â __node_type* _M_node;
> Â Â Â Â };
>
> I've attached a patch implementing these suggested changes. I think
> it's simpler this way, what do you think?
I think it is much better than what I proposed.
>
> (This patch is generated with 'git diff -b' so whitespace changes
> aren't shown, so this shouldn't be applied directly. I can send
> another version with the whitespace changes that is suitable to
> commit).
>
Mine was generated with 'git diff -w' for the same reason.
You can send the full patch to me if you prefer but feel free to commit
it yourself.
François
More information about the Gcc-patches
mailing list