Review Hashtable extract node API

Tue Jun 18 05:52:00 GMT 2019

A small regression noticed while merging.

We shouldn't keep on using a moved-from key_type instance.

Ok to commit ? Feel free to do it if you prefer, I'll do so at end of 
Europe day otherwise.

FranÃ§ois

On 6/17/19 12:28 PM, Jonathan Wakely wrote:
> On 07/06/19 18:39 +0200, FranÃ§ois Dumont wrote:
>> On 6/5/19 6:22 PM, Jonathan Wakely wrote:
>>> On 04/06/19 19:19 +0200, FranÃ§ois Dumont wrote:
>>>> Hi
>>>>
>>>> Â Â Â  Here is a patch to enhance the _Hashtable extract node API and 
>>>> fix a FIXME request.
>>>>
>>>> Â Â Â  The enhancement to the extract node Api is that extract(const 
>>>> key_type&) do not call extract(const_iterator) anymore. Doing so we 
>>>> had to loop again through bucket nodes to find the previous node to 
>>>> the one to extract. Even if a bucket shall not contain many nodes 
>>>> (in unique key mode) it's easy to avoid it.
>>>
>>> Nice.
>>>
>>>> Â Â Â  To fix the FIXME I introduced a node smart pointer type 
>>>> managing the node lifetime. The node is extracted from this smart 
>>>> pointer only when there can't be any exception raised. In the 
>>>> context of the node extract api the node handle is considered as a 
>>>> smart pointer. So the node handle will remain owner of the node in 
>>>> case of exception when reinserting it, I hope it is the expected 
>>>> behavior.
>>>
>>> Yes, that's right.
>>>
>>> I was going to suggest just using the node handle type instead of
>>> inventing a new smart pointer, but the handle type uses std::optional
>>> so isn't available for C++11/14.
>>
>> I considered it too, or even use shared_ptr but thought it was overkill.
>>
>>>
>>>
>>>> Â Â Â  * include/bits/hashtable_policy.h
>>>> Â Â Â  (struct _NodeSmartPointer<_NodeAlloc>): New.
>>>> Â Â Â  (_Map_base<>::operator[](const key_type&)): Use latter, adapt.
>>>> Â Â Â  (_Map_base<>::operator[](key_type&&)): Likewise.
>>>> Â Â Â  * include/bits/hashtable.h
>>>> Â Â Â  (_Hashtable<>::__node_sp_t): New.
>>>> Â Â Â  (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code,
>>>> Â Â Â  __node_type*, size_type)): Replace by...
>>>> (_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&,
>>>> Â Â Â  size_type, __hash_code, const _NodeAccessor&, size_type)): 
>>>> ...that.
>>>> Â Â Â  (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code,
>>>> Â Â Â  __node_type*)): Replace by...
>>>> (_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*,
>>>> Â Â Â  __hash_code, const _NodeAccessor&)): ...that.
>>>> Â Â Â  (_Hashtable<>::_M_reinsert_node): Adapt.
>>>> Â Â Â  (_Hashtable<>::_M_reinsert_node_multi): Adapt.
>>>> Â Â Â  (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New.
>>>> Â Â Â  (_Hashtable<>::extract(const_iterator)): Use latter.
>>>> Â Â Â  (_Hashtable<>::extract(const _Key&)): Likewise.
>>>> Â Â Â  (_Hashtable<>::_M_merge_unique): Adapt.
>>>> Â Â Â  (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt.
>>>> (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type,
>>>> Â Â Â  _Args&&...)): Adapt.
>>>>
>>>> Tested under Linux x86_64.
>>>>
>>>> Ok to commit ?
>>>>
>>>> FranÃ§ois
>>>>
>>>
>>>> diff --git a/libstdc++-v3/include/bits/hashtable.h 
>>>> b/libstdc++-v3/include/bits/hashtable.h
>>>> index e2e3f016a35..307865b96bf 100644
>>>> --- a/libstdc++-v3/include/bits/hashtable.h
>>>> +++ b/libstdc++-v3/include/bits/hashtable.h
>>>> @@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>> Â Â Â Â Â  using __hash_cached = typename __traits_type::__hash_cached;
>>>> Â Â Â Â Â  using __node_type = __detail::_Hash_node<_Value, 
>>>> __hash_cached::value>;
>>>> Â Â Â Â Â  using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>;
>>>> +Â Â Â Â Â  using __node_sp_t = 
>>>> __detail::_NodeSmartPointer<__node_alloc_type>;
>>>>
>>>> Â Â Â Â Â  using __hashtable_alloc = 
>>>> __detail::_Hashtable_alloc<__node_alloc_type>;
>>>>
>>>> @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>> Â Â Â Â Â  __node_base*
>>>> Â Â Â Â Â  _M_get_previous_node(size_type __bkt, __node_base* __n);
>>>>
>>>> -Â Â Â Â Â  // Insert node with hash code __code, in bucket bkt if no 
>>>> rehash (assumes
>>>> -Â Â Â Â Â  // no element with its key already present). Take ownership 
>>>> of the node,
>>>> -Â Â Â Â Â  // deallocate it on exception.
>>>> +Â Â Â Â Â  // Insert node with key __k and hash code __code, in bucket 
>>>> __bkt if no
>>>> +Â Â Â Â Â  // rehash (assumes no element with its key already present).
>>>> +Â Â Â Â Â  template<typename _NodeAccessor>
>>>> Â Â Â Â iterator
>>>> -Â Â Â Â Â  _M_insert_unique_node(size_type __bkt, __hash_code __code,
>>>> -Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  __node_type* __n, size_type __n_elt = 1);
>>>> +Â Â Â  _M_insert_unique_node(const key_type& __k, size_type __bkt,
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  __hash_code __code, const _NodeAccessor&,
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  size_type __n_elt = 1);
>>>>
>>>> -Â Â Â Â Â  // Insert node with hash code __code. Take ownership of the 
>>>> node,
>>>> -Â Â Â Â Â  // deallocate it on exception.
>>>> +Â Â Â Â Â  // Insert node with hash code __code.
>>>> +Â Â Â Â Â  template<typename _NodeAccessor>
>>>> Â Â Â Â iterator
>>>> -Â Â Â Â Â  _M_insert_multi_node(__node_type* __hint,
>>>> -Â Â Â Â Â Â Â Â Â Â Â Â Â Â  __hash_code __code, __node_type* __n);
>>>> +Â Â Â  _M_insert_multi_node(__node_type* __hint, __hash_code __code,
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  const _NodeAccessor& __node_accessor);
>>>
>>> It looks like most times you call these functions you pass an
>>> identical lambda expression, but each of those lambda expressions will
>>> create a unique type. That means you create different instantiations
>>> of the function templates even though they do exactly the same thing.
>>>
>>> That's just generating multiple copies of identical code. Passing in a
>>> function object to provide the node pointer doesn't really seem
>>> necessary anyway, so if it results in larger executables it's really
>>> not desirable.
>>
>>
>> Maybe one the reasons we have a difference when running performance 
>> tests in -03. I don't know why I abused so much of lambdas when I see 
>> your proposal.
>>
>>
>>>
>>> The attached patch still just passes in a node pointer (which the
>>> function takes ownership of, unless it throws). Because the semantics
>>> of _M_insert_multi_node change, we need the symbol name to change as
>>> well (so old code linking to the old symbol doesn't find the new
>>> definition with different semantics). Passing in the key makes it a
>>> different symbol (in almost all cases the caller has the key available
>>> already anyway).
>> Very nice.
>>>
>>> An alternative design would be for _M_insert_(unique|multi)_node to
>>> take a reference to the RAII type and set its _M_node member to null
>>> after taking ownership of that pointer. That would be more explicit
>>> about the ownership transfer, however it would require changes to the
>>> _M_reinsert_node and _M_reinsert_node_multi functions which don't use
>>> the RAII type (because the __node_type* is already owned by a node
>>> handle).
>>>
>>>>
>>>> Â Â Â Â Â  template<typename... _Args>
>>>> Â Â Â Â std::pair<iterator, bool>
>>>> @@ -805,9 +807,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>> Â Â Â Â Â Â Â Â Â  }
>>>> Â Â Â Â Â Â Â  else
>>>> Â Â Â Â Â Â Â Â Â  {
>>>> -Â Â Â Â Â Â Â  __ret.position
>>>> -Â Â Â Â Â Â Â Â Â  = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
>>>> -Â Â Â Â Â Â Â  __nh._M_ptr = nullptr;
>>>> +Â Â Â Â Â Â Â  __ret.position = _M_insert_unique_node(__k, __bkt, __code,
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  [&__nh]()
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  { return std::exchange(__nh._M_ptr, nullptr); });
>>>
>>> The existing code here can just be changed to pass in __k, but still
>>> set __nh._M_ptr to null after the call returns.
>>>
>>> i.e.
>>>
>>> Â Â Â Â Â Â Â  __ret.position
>>> Â Â Â Â Â Â Â Â Â  = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr);
>>> Â Â Â Â Â Â Â  __nh._M_ptr = nullptr;
>>>
>>>
>>>> Â Â Â Â Â Â Â  __ret.inserted = true;
>>>> Â Â Â Â Â Â Â Â Â  }
>>>> Â Â Â Â Â  }
>>>> @@ -818,33 +820,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>> Â Â Â Â Â  iterator
>>>> Â Â Â Â Â  _M_reinsert_node_multi(const_iterator __hint, node_type&& __nh)
>>>> Â Â Â Â Â  {
>>>> -Â Â Â  iterator __ret;
>>>> Â Â Â Â if (__nh.empty())
>>>> -Â Â Â Â Â  __ret = end();
>>>> -Â Â Â  else
>>>> -Â Â Â Â Â  {
>>>> +Â Â Â Â Â  return end();
>>>> +
>>>> Â Â Â Â __glibcxx_assert(get_allocator() == __nh.get_allocator());
>>>>
>>>> Â Â Â Â auto __code = this->_M_hash_code(__nh._M_key());
>>>> -Â Â Â Â Â Â Â  auto __node = std::exchange(__nh._M_ptr, nullptr);
>>>> -Â Â Â Â Â Â Â  // FIXME: this deallocates the node on exception.
>>>> -Â Â Â Â Â Â Â  __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
>>>> -Â Â Â Â Â  }
>>>> -Â Â Â  return __ret;
>>>> +Â Â Â  return _M_insert_multi_node(__hint._M_cur, __code,
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  [&__nh]()
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  { return std::exchange(__nh._M_ptr, nullptr); });
>>>
>>> Now that the call won't deallocate anything, we can fix the FIXME by
>>> just delaying resetting __nh._M_ptr until the call has returned:
>>>
>>> Â Â Â Â Â Â  auto __ret = _M_insert_multi_node(__hint._M_cur, __code, 
>>> __node);
>>> Â Â Â Â Â Â  __nh._M_ptr = nullptr;
>>> Â Â Â Â Â Â  return __ret;
>>>
>>>
>>>> Â Â Â Â Â  }
>>>>
>>>> +Â Â Â  private:
>>>> Â Â Â Â Â  /// Extract a node.
>>>
>>> We don't want a Doxygen "///" comment on a private function, just "//"
>>> is OK. In this case I don't think we need any comment, the
>>> _M_extract_node name is clear enough. Saying "extract a node" adds
>>> nothing.
>>>
>>>
>>>> Â Â Â Â Â  node_type
>>>> -Â Â Â Â Â  extract(const_iterator __pos)
>>>> +Â Â Â Â Â  _M_extract_node(size_t __bkt, __node_base* __prev_n)
>>>> Â Â Â Â Â  {
>>>> -Â Â Â  __node_type* __n = __pos._M_cur;
>>>> -Â Â Â  size_t __bkt = _M_bucket_index(__n);
>>>> -
>>>> -Â Â Â  // Look for previous node to unlink it from the erased one, this
>>>> -Â Â Â  // is why we need buckets to contain the before begin to make
>>>> -Â Â Â  // this search fast.
>>>> -Â Â Â  __node_base* __prev_n = _M_get_previous_node(__bkt, __n);
>>>> -
>>>> +Â Â Â  __node_type* __n = static_cast<__node_type*>(__prev_n->_M_nxt);
>>>> Â Â Â Â if (__prev_n == _M_buckets[__bkt])
>>>> Â Â Â Â Â  _M_remove_bucket_begin(__bkt, __n->_M_next(),
>>>> Â Â Â Â Â Â Â Â  __n->_M_nxt ? _M_bucket_index(__n->_M_next()) : 0);
>>>> @@ -861,14 +853,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>> Â Â Â Â return { __n, this->_M_node_allocator() };
>>>> Â Â Â Â Â  }
>>>>
>>>> +Â Â Â  public:
>>>> +Â Â Â Â Â  /// Extract a node.
>>>> +Â Â Â Â Â  node_type
>>>> +Â Â Â Â Â  extract(const_iterator __pos)
>>>> +Â Â Â Â Â  {
>>>> +Â Â Â  size_t __bkt = _M_bucket_index(__pos._M_cur);
>>>> +Â Â Â  return _M_extract_node(__bkt, _M_get_previous_node(__bkt, 
>>>> __pos._M_cur));
>>>> +Â Â Â Â Â  }
>>>> +
>>>> Â Â Â Â Â  /// Extract a node.
>>>> Â Â Â Â Â  node_type
>>>> Â Â Â Â Â  extract(const _Key& __k)
>>>> Â Â Â Â Â  {
>>>> Â Â Â Â node_type __nh;
>>>> -Â Â Â  auto __pos = find(__k);
>>>> -Â Â Â  if (__pos != end())
>>>> -Â Â Â Â Â  __nh = extract(const_iterator(__pos));
>>>> +Â Â Â  __hash_code __code = this->_M_hash_code(__k);
>>>> +Â Â Â  std::size_t __bkt = _M_bucket_index(__k, __code);
>>>> +Â Â Â  if (__node_base* __prev_node = _M_find_before_node(__bkt, __k, 
>>>> __code))
>>>> +Â Â Â Â Â  __nh = _M_extract_node(__bkt, __prev_node);
>>>> Â Â Â Â return __nh;
>>>> Â Â Â Â Â  }
>>>
>>> Looks good.
>>>
>>>> @@ -885,14 +887,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>> Â Â Â Â Â  for (auto __i = __src.begin(), __end = __src.end(); __i != 
>>>> __end;)
>>>> Â Â Â Â Â Â Â  {
>>>> Â Â Â Â Â Â Â Â Â  auto __pos = __i++;
>>>> -Â Â Â Â Â Â Â Â Â  const key_type& __k = 
>>>> this->_M_extract()(__pos._M_cur->_M_v());
>>>> +Â Â Â Â Â Â Â Â Â  const key_type& __k = this->_M_extract()(*__pos);
>>>> Â Â Â Â Â Â Â Â Â  __hash_code __code = this->_M_hash_code(__k);
>>>> Â Â Â Â Â Â Â Â Â  size_type __bkt = _M_bucket_index(__k, __code);
>>>> Â Â Â Â Â Â Â Â Â  if (_M_find_node(__bkt, __k, __code) == nullptr)
>>>> Â Â Â Â Â Â Â  {
>>>> Â Â Â Â Â Â Â Â Â  auto __nh = __src.extract(__pos);
>>>> -Â Â Â Â Â Â Â Â Â  _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
>>>> -Â Â Â Â Â Â Â Â Â  __nh._M_ptr = nullptr;
>>>> +Â Â Â Â Â Â Â Â Â  _M_insert_unique_node(__k, __bkt, __code,
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  [&__nh]()
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  { return std::exchange(__nh._M_ptr, nullptr); },
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  __n_elt);
>>>
>>> Again, the old code seems fine. Just pass in __k.
>>>
>>>> Â Â Â Â Â Â Â Â Â  __n_elt = 1;
>>>> Â Â Â Â Â Â Â  }
>>>> Â Â Â Â Â Â Â Â Â  else if (__n_elt != 1)
>>>> @@ -1634,31 +1638,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>> Â Â Â Â Â  -> pair<iterator, bool>
>>>> Â Â Â Â Â  {
>>>> Â Â Â Â // First build the node to get access to the hash code
>>>> -Â Â Â  __node_type* __node
>>>> -Â Â Â Â Â  = this->_M_allocate_node(std::forward<_Args>(__args)...);
>>>> -Â Â Â  const key_type& __k = this->_M_extract()(__node->_M_v());
>>>> -Â Â Â  __hash_code __code;
>>>> -Â Â Â  __try
>>>> -Â Â Â Â Â  {
>>>> -Â Â Â Â Â Â Â  __code = this->_M_hash_code(__k);
>>>> -Â Â Â Â Â  }
>>>> -Â Â Â  __catch(...)
>>>> -Â Â Â Â Â  {
>>>> -Â Â Â Â Â Â Â  this->_M_deallocate_node(__node);
>>>
>>> Aside: It's confusing that _M_allocate_node actually allocates a node
>>> and constructs an element, and _M_deallocate_node destroys the element
>>> and deallocates the node (and we have _M_deallocate_node_ptr which
>>> just does the deallocation part). We should add comments explaining
>>> that.
>>>
>>>
>>>> diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
>>>> b/libstdc++-v3/include/bits/hashtable_policy.h
>>>> index 878154ae210..ddcff0ec9d0 100644
>>>> --- a/libstdc++-v3/include/bits/hashtable_policy.h
>>>> +++ b/libstdc++-v3/include/bits/hashtable_policy.h
>>>> @@ -170,6 +170,40 @@ namespace __detail
>>>> Â Â Â Â Â  __hashtable_alloc& _M_h;
>>>> Â Â Â  };
>>>>
>>>> +Â  // Node smart pointer to benefit from RAII.
>>>> +Â  template<typename _NodeAlloc>
>>>> +Â Â Â  struct _NodeSmartPointer
>>>> +Â Â Â  {
>>>> +Â Â Â  private:
>>>> +Â Â Â Â Â  using __hashtable_alloc = _Hashtable_alloc<_NodeAlloc>;
>>>> +Â Â Â Â Â  using __node_type = typename __hashtable_alloc::__node_type;
>>>> +
>>>> +Â Â Â  public:
>>>> +Â Â Â Â Â  _NodeSmartPointer(__hashtable_alloc& __h, __node_type* 
>>>> __node) noexcept
>>>> +Â Â Â Â Â  : _M_h(__h), _M_node(__node) { }
>>>> +
>>>> +Â Â Â Â Â  ~_NodeSmartPointer()
>>>> +Â Â Â Â Â  {
>>>> +Â Â Â  if (_M_node)
>>>> +Â Â Â Â Â  _M_h._M_deallocate_node(_M_node);
>>>> +Â Â Â Â Â  }
>>>> +
>>>> +Â Â Â Â Â  __node_type*
>>>> +Â Â Â Â Â  _M_get() const { return _M_node; }
>>>> +
>>>> +Â Â Â Â Â  __node_type*
>>>> +Â Â Â Â Â  _M_release() noexcept
>>>> +Â Â Â Â Â  {
>>>> +Â Â Â  __node_type* __tmp = _M_node;
>>>> +Â Â Â  _M_node = nullptr;
>>>> +Â Â Â  return __tmp;
>>>> +Â Â Â Â Â  }
>>>> +
>>>> +Â Â Â  private:
>>>> +Â Â Â Â Â  __hashtable_alloc& _M_h;
>>>> +Â Â Â Â Â  __node_type* _M_node;
>>>> +Â Â Â  };
>>>
>>> This can be defined as a member of _Hashtable, and can be much
>>> simpler:
>>>
>>> Â Â Â Â  // Simple RAII type for managing a node containing an element
>>> Â Â Â Â  struct _Scoped_node
>>> Â Â Â Â  {
>>> Â Â Â Â _Scoped_node(__hashtable_alloc* __h, __node_type* __n)
>>> Â Â Â Â : _M_h(__h), _M_node(__n) { }
>>>
>>> Â Â Â Â ~_Scoped_node() { if (_M_node) 
>>> _M_h->_M_deallocate_node(_M_node); };
>>>
>>> Â Â Â Â _Scoped_node(const _Scoped_node&) = delete;
>>> Â Â Â Â _Scoped_node& operator=(const _Scoped_node&) = delete;
>>>
>>> Â Â Â Â __hashtable_alloc* _M_h;
>>> Â Â Â Â __node_type* _M_node;
>>> Â Â Â Â  };
>>>
>>> I've attached a patch implementing these suggested changes. I think
>>> it's simpler this way, what do you think?
>>
>> I think it is much better than what I proposed.
>>
>>
>>>
>>> (This patch is generated with 'git diff -b' so whitespace changes
>>> aren't shown, so this shouldn't be applied directly. I can send
>>> another version with the whitespace changes that is suitable to
>>> commit).
>>>
>> Mine was generated with 'git diff -w' for the same reason.
>>
>> You can send the full patch to me if you prefer but feel free to 
>> commit it yourself.
>
> Here's what I've committed. Compared to the previous patch this adds a
> constructor to _Scoped_node that creates the new node, so the caller
> doesn't have to create it:
>
> Â Â Â Â Â  // Allocate a node and construct an element within it.
> Â Â Â Â Â  template<typename... _Args>
> Â Â Â Â Â Â Â  _Scoped_node(__hashtable_alloc* __h, _Args&&... __args)
> Â Â Â Â Â Â Â  : _M_h(__h),
> _M_node(__h->_M_allocate_node(std::forward<_Args>(__args)...))
> Â Â Â Â Â Â Â  { }
>
> Tested x86_64-linux, committed to trunk.
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: hashtable.patch
Type: text/x-patch
Size: 582 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/libstdc++/attachments/20190618/65eb845f/attachment.bin>