Review Hashtable extract node API

Jonathan Wakely jwakely@redhat.com
Mon Jun 17 10:28:00 GMT 2019


On 07/06/19 18:39 +0200, François Dumont wrote:
>On 6/5/19 6:22 PM, Jonathan Wakely wrote:
>>On 04/06/19 19:19 +0200, François Dumont wrote:
>>>Hi
>>>
>>>    Here is a patch to enhance the _Hashtable extract node API and 
>>>fix a FIXME request.
>>>
>>>    The enhancement to the extract node Api is that extract(const 
>>>key_type&) do not call extract(const_iterator) anymore. Doing so 
>>>we had to loop again through bucket nodes to find the previous 
>>>node to the one to extract. Even if a bucket shall not contain 
>>>many nodes (in unique key mode) it's easy to avoid it.
>>
>>Nice.
>>
>>>    To fix the FIXME I introduced a node smart pointer type 
>>>managing the node lifetime. The node is extracted from this smart 
>>>pointer only when there can't be any exception raised. In the 
>>>context of the node extract api the node handle is considered as a 
>>>smart pointer. So the node handle will remain owner of the node in 
>>>case of exception when reinserting it, I hope it is the expected 
>>>behavior.
>>
>>Yes, that's right.
>>
>>I was going to suggest just using the node handle type instead of
>>inventing a new smart pointer, but the handle type uses std::optional
>>so isn't available for C++11/14.
>
>I considered it too, or even use shared_ptr but thought it was overkill.
>
>>
>>
>>>    * include/bits/hashtable_policy.h
>>>    (struct _NodeSmartPointer<_NodeAlloc>): New.
>>>    (_Map_base<>::operator[](const key_type&)): Use latter, adapt.
>>>    (_Map_base<>::operator[](key_type&&)): Likewise.
>>>    * include/bits/hashtable.h
>>>    (_Hashtable<>::__node_sp_t): New.
>>>    (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code,
>>>    __node_type*, size_type)): Replace by...
>>>(_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&,
>>>    size_type, __hash_code, const _NodeAccessor&, size_type)): ...that.
>>>    (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code,
>>>    __node_type*)): Replace by...
>>>(_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*,
>>>    __hash_code, const _NodeAccessor&)): ...that.
>>>    (_Hashtable<>::_M_reinsert_node): Adapt.
>>>    (_Hashtable<>::_M_reinsert_node_multi): Adapt.
>>>    (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New.
>>>    (_Hashtable<>::extract(const_iterator)): Use latter.
>>>    (_Hashtable<>::extract(const _Key&)): Likewise.
>>>    (_Hashtable<>::_M_merge_unique): Adapt.
>>>    (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt.
>>>    (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type,
>>>    _Args&&...)): Adapt.
>>>
>>>Tested under Linux x86_64.
>>>
>>>Ok to commit ?
>>>
>>>François
>>>
>>
>>>diff --git a/libstdc++-v3/include/bits/hashtable.h 
>>>b/libstdc++-v3/include/bits/hashtable.h
>>>index e2e3f016a35..307865b96bf 100644
>>>--- a/libstdc++-v3/include/bits/hashtable.h
>>>+++ b/libstdc++-v3/include/bits/hashtable.h
>>>@@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>      using __hash_cached = typename __traits_type::__hash_cached;
>>>      using __node_type = __detail::_Hash_node<_Value, 
>>>__hash_cached::value>;
>>>      using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>;
>>>+      using __node_sp_t = 
>>>__detail::_NodeSmartPointer<__node_alloc_type>;
>>>
>>>      using __hashtable_alloc = 
>>>__detail::_Hashtable_alloc<__node_alloc_type>;
>>>
>>>@@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>      __node_base*
>>>      _M_get_previous_node(size_type __bkt, __node_base* __n);
>>>
>>>-      // Insert node with hash code __code, in bucket bkt if no 
>>>rehash (assumes
>>>-      // no element with its key already present). Take ownership 
>>>of the node,
>>>-      // deallocate it on exception.
>>>+      // Insert node with key __k and hash code __code, in bucket 
>>>__bkt if no
>>>+      // rehash (assumes no element with its key already present).
>>>+      template<typename _NodeAccessor>
>>>    iterator
>>>-      _M_insert_unique_node(size_type __bkt, __hash_code __code,
>>>-                __node_type* __n, size_type __n_elt = 1);
>>>+    _M_insert_unique_node(const key_type& __k, size_type __bkt,
>>>+                  __hash_code __code, const _NodeAccessor&,
>>>+                  size_type __n_elt = 1);
>>>
>>>-      // Insert node with hash code __code. Take ownership of the node,
>>>-      // deallocate it on exception.
>>>+      // Insert node with hash code __code.
>>>+      template<typename _NodeAccessor>
>>>    iterator
>>>-      _M_insert_multi_node(__node_type* __hint,
>>>-               __hash_code __code, __node_type* __n);
>>>+    _M_insert_multi_node(__node_type* __hint, __hash_code __code,
>>>+                 const _NodeAccessor& __node_accessor);
>>
>>It looks like most times you call these functions you pass an
>>identical lambda expression, but each of those lambda expressions will
>>create a unique type. That means you create different instantiations
>>of the function templates even though they do exactly the same thing.
>>
>>That's just generating multiple copies of identical code. Passing in a
>>function object to provide the node pointer doesn't really seem
>>necessary anyway, so if it results in larger executables it's really
>>not desirable.
>
>
>Maybe one the reasons we have a difference when running performance 
>tests in -03. I don't know why I abused so much of lambdas when I see 
>your proposal.
>
>
>>
>>The attached patch still just passes in a node pointer (which the
>>function takes ownership of, unless it throws). Because the semantics
>>of _M_insert_multi_node change, we need the symbol name to change as
>>well (so old code linking to the old symbol doesn't find the new
>>definition with different semantics). Passing in the key makes it a
>>different symbol (in almost all cases the caller has the key available
>>already anyway).
>Very nice.
>>
>>An alternative design would be for _M_insert_(unique|multi)_node to
>>take a reference to the RAII type and set its _M_node member to null
>>after taking ownership of that pointer. That would be more explicit
>>about the ownership transfer, however it would require changes to the
>>_M_reinsert_node and _M_reinsert_node_multi functions which don't use
>>the RAII type (because the __node_type* is already owned by a node
>>handle).
>>
>>>
>>>      template<typename... _Args>
>>>    std::pair<iterator, bool>
>>>@@ -805,9 +807,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>          }
>>>        else
>>>          {
>>>-        __ret.position
>>>-          = _M_insert_unique_node(__bkt, __code, __nh._M_ptr);
>>>-        __nh._M_ptr = nullptr;
>>>+        __ret.position = _M_insert_unique_node(__k, __bkt, __code,
>>>+                [&__nh]()
>>>+                { return std::exchange(__nh._M_ptr, nullptr); });
>>
>>The existing code here can just be changed to pass in __k, but still
>>set __nh._M_ptr to null after the call returns.
>>
>>i.e.
>>
>>        __ret.position
>>          = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr);
>>        __nh._M_ptr = nullptr;
>>
>>
>>>        __ret.inserted = true;
>>>          }
>>>      }
>>>@@ -818,33 +820,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>      iterator
>>>      _M_reinsert_node_multi(const_iterator __hint, node_type&& __nh)
>>>      {
>>>-    iterator __ret;
>>>    if (__nh.empty())
>>>-      __ret = end();
>>>-    else
>>>-      {
>>>+      return end();
>>>+
>>>    __glibcxx_assert(get_allocator() == __nh.get_allocator());
>>>
>>>    auto __code = this->_M_hash_code(__nh._M_key());
>>>-        auto __node = std::exchange(__nh._M_ptr, nullptr);
>>>-        // FIXME: this deallocates the node on exception.
>>>-        __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
>>>-      }
>>>-    return __ret;
>>>+    return _M_insert_multi_node(__hint._M_cur, __code,
>>>+                  [&__nh]()
>>>+                  { return std::exchange(__nh._M_ptr, nullptr); });
>>
>>Now that the call won't deallocate anything, we can fix the FIXME by
>>just delaying resetting __nh._M_ptr until the call has returned:
>>
>>       auto __ret = _M_insert_multi_node(__hint._M_cur, __code, __node);
>>       __nh._M_ptr = nullptr;
>>       return __ret;
>>
>>
>>>      }
>>>
>>>+    private:
>>>      /// Extract a node.
>>
>>We don't want a Doxygen "///" comment on a private function, just "//"
>>is OK. In this case I don't think we need any comment, the
>>_M_extract_node name is clear enough. Saying "extract a node" adds
>>nothing.
>>
>>
>>>      node_type
>>>-      extract(const_iterator __pos)
>>>+      _M_extract_node(size_t __bkt, __node_base* __prev_n)
>>>      {
>>>-    __node_type* __n = __pos._M_cur;
>>>-    size_t __bkt = _M_bucket_index(__n);
>>>-
>>>-    // Look for previous node to unlink it from the erased one, this
>>>-    // is why we need buckets to contain the before begin to make
>>>-    // this search fast.
>>>-    __node_base* __prev_n = _M_get_previous_node(__bkt, __n);
>>>-
>>>+    __node_type* __n = static_cast<__node_type*>(__prev_n->_M_nxt);
>>>    if (__prev_n == _M_buckets[__bkt])
>>>      _M_remove_bucket_begin(__bkt, __n->_M_next(),
>>>         __n->_M_nxt ? _M_bucket_index(__n->_M_next()) : 0);
>>>@@ -861,14 +853,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>    return { __n, this->_M_node_allocator() };
>>>      }
>>>
>>>+    public:
>>>+      /// Extract a node.
>>>+      node_type
>>>+      extract(const_iterator __pos)
>>>+      {
>>>+    size_t __bkt = _M_bucket_index(__pos._M_cur);
>>>+    return _M_extract_node(__bkt, _M_get_previous_node(__bkt, 
>>>__pos._M_cur));
>>>+      }
>>>+
>>>      /// Extract a node.
>>>      node_type
>>>      extract(const _Key& __k)
>>>      {
>>>    node_type __nh;
>>>-    auto __pos = find(__k);
>>>-    if (__pos != end())
>>>-      __nh = extract(const_iterator(__pos));
>>>+    __hash_code __code = this->_M_hash_code(__k);
>>>+    std::size_t __bkt = _M_bucket_index(__k, __code);
>>>+    if (__node_base* __prev_node = _M_find_before_node(__bkt, 
>>>__k, __code))
>>>+      __nh = _M_extract_node(__bkt, __prev_node);
>>>    return __nh;
>>>      }
>>
>>Looks good.
>>
>>>@@ -885,14 +887,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>      for (auto __i = __src.begin(), __end = __src.end(); __i != __end;)
>>>        {
>>>          auto __pos = __i++;
>>>-          const key_type& __k = 
>>>this->_M_extract()(__pos._M_cur->_M_v());
>>>+          const key_type& __k = this->_M_extract()(*__pos);
>>>          __hash_code __code = this->_M_hash_code(__k);
>>>          size_type __bkt = _M_bucket_index(__k, __code);
>>>          if (_M_find_node(__bkt, __k, __code) == nullptr)
>>>        {
>>>          auto __nh = __src.extract(__pos);
>>>-          _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt);
>>>-          __nh._M_ptr = nullptr;
>>>+          _M_insert_unique_node(__k, __bkt, __code,
>>>+                [&__nh]()
>>>+                { return std::exchange(__nh._M_ptr, nullptr); },
>>>+                __n_elt);
>>
>>Again, the old code seems fine. Just pass in __k.
>>
>>>          __n_elt = 1;
>>>        }
>>>          else if (__n_elt != 1)
>>>@@ -1634,31 +1638,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>      -> pair<iterator, bool>
>>>      {
>>>    // First build the node to get access to the hash code
>>>-    __node_type* __node
>>>-      = this->_M_allocate_node(std::forward<_Args>(__args)...);
>>>-    const key_type& __k = this->_M_extract()(__node->_M_v());
>>>-    __hash_code __code;
>>>-    __try
>>>-      {
>>>-        __code = this->_M_hash_code(__k);
>>>-      }
>>>-    __catch(...)
>>>-      {
>>>-        this->_M_deallocate_node(__node);
>>
>>Aside: It's confusing that _M_allocate_node actually allocates a node
>>and constructs an element, and _M_deallocate_node destroys the element
>>and deallocates the node (and we have _M_deallocate_node_ptr which
>>just does the deallocation part). We should add comments explaining
>>that.
>>
>>
>>>diff --git a/libstdc++-v3/include/bits/hashtable_policy.h 
>>>b/libstdc++-v3/include/bits/hashtable_policy.h
>>>index 878154ae210..ddcff0ec9d0 100644
>>>--- a/libstdc++-v3/include/bits/hashtable_policy.h
>>>+++ b/libstdc++-v3/include/bits/hashtable_policy.h
>>>@@ -170,6 +170,40 @@ namespace __detail
>>>      __hashtable_alloc& _M_h;
>>>    };
>>>
>>>+  // Node smart pointer to benefit from RAII.
>>>+  template<typename _NodeAlloc>
>>>+    struct _NodeSmartPointer
>>>+    {
>>>+    private:
>>>+      using __hashtable_alloc = _Hashtable_alloc<_NodeAlloc>;
>>>+      using __node_type = typename __hashtable_alloc::__node_type;
>>>+
>>>+    public:
>>>+      _NodeSmartPointer(__hashtable_alloc& __h, __node_type* 
>>>__node) noexcept
>>>+      : _M_h(__h), _M_node(__node) { }
>>>+
>>>+      ~_NodeSmartPointer()
>>>+      {
>>>+    if (_M_node)
>>>+      _M_h._M_deallocate_node(_M_node);
>>>+      }
>>>+
>>>+      __node_type*
>>>+      _M_get() const { return _M_node; }
>>>+
>>>+      __node_type*
>>>+      _M_release() noexcept
>>>+      {
>>>+    __node_type* __tmp = _M_node;
>>>+    _M_node = nullptr;
>>>+    return __tmp;
>>>+      }
>>>+
>>>+    private:
>>>+      __hashtable_alloc& _M_h;
>>>+      __node_type* _M_node;
>>>+    };
>>
>>This can be defined as a member of _Hashtable, and can be much
>>simpler:
>>
>>     // Simple RAII type for managing a node containing an element
>>     struct _Scoped_node
>>     {
>>    _Scoped_node(__hashtable_alloc* __h, __node_type* __n)
>>    : _M_h(__h), _M_node(__n) { }
>>
>>    ~_Scoped_node() { if (_M_node) _M_h->_M_deallocate_node(_M_node); };
>>
>>    _Scoped_node(const _Scoped_node&) = delete;
>>    _Scoped_node& operator=(const _Scoped_node&) = delete;
>>
>>    __hashtable_alloc* _M_h;
>>    __node_type* _M_node;
>>     };
>>
>>I've attached a patch implementing these suggested changes. I think
>>it's simpler this way, what do you think?
>
>I think it is much better than what I proposed.
>
>
>>
>>(This patch is generated with 'git diff -b' so whitespace changes
>>aren't shown, so this shouldn't be applied directly. I can send
>>another version with the whitespace changes that is suitable to
>>commit).
>>
>Mine was generated with 'git diff -w' for the same reason.
>
>You can send the full patch to me if you prefer but feel free to 
>commit it yourself.

Here's what I've committed. Compared to the previous patch this adds a
constructor to _Scoped_node that creates the new node, so the caller
doesn't have to create it:

       // Allocate a node and construct an element within it.
       template<typename... _Args>
         _Scoped_node(__hashtable_alloc* __h, _Args&&... __args)
         : _M_h(__h),
           _M_node(__h->_M_allocate_node(std::forward<_Args>(__args)...))
         { }

Tested x86_64-linux, committed to trunk.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.txt
Type: text/x-patch
Size: 20099 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20190617/588f5cd1/attachment.bin>


More information about the Gcc-patches mailing list