This is the mail archive of the mailing list for the libstdc++ project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: hash policy patch

On 09/09/2011 09:43 PM, François Dumont wrote:
On 09/01/2011 08:36 PM, Paolo Carlini wrote:
     After some additional time spent on hashtable I prefer to do this
new proposal. It fixes
- Yet some issues with hash policy that was still able to give a
bucket count for which load_factor == max_load_factor, Standard says
load_factor<  max_load_factor. Note that in fact the refactoring to
generalize use of _M_next_bkt had indeed a side effect which is that
when looking for instance for 11.5 lower_bound was returning 13 but
now that it is casted to integer it will return 11. Not only that
reason now _M_next_bkt takes an optional __strict bool parameter
signaling if the returned value shall be not only not shorter but even
- In hashtable implementation I removed usages of std::max that was
potentially leaving the hashtable in an inconsistent state with a hash
policy next resize value not matching the current bucket count. It was
not really a bug because the next resize value was updated on the next
insertion but at the cost of a useless floating point operation.
- I deal with allocation failure directly in _M_rehash method to avoid
introducing new try/catch blocks. I also reset hash policy next resize
value when the container is emptied on a hash functor exception.
- In __rehash_policy I only commit the new hash policy instance if the
rehash operation succeeded. The associated test
requires exception support, is there already a way to detect it or I
need to add a new dg-require-exceptions dejaGnu macro ?

Today, I had time to look a bit more into these issues. I think we
should handle one change at a time. About the first one above, I don't
like the new __strict parameter, looks like we are going through this
complication only because we are refactoring to use _M_next_bkt, because
otherwise, if I understand correctly, we are not really incorrect, since
we are talking about something like *strict* equality of *floating*
point quantities, by itself something badly defined (indeed, carefully,
the standard talks about "keeping the load factor below this number",
using plain English, not a formula).

I think we can delay point 2.

For points 3 and 4 above, I would like to see separate patches and
separate testcases. Is it possible?

Here is the patch for point 4 regarding the __rehash_policy implementation.

2011-09-09 François Dumont <>

* include/bits/hashtable.h (_Hashtable<>::__rehash_policy(const
_RehashPolicy&)): Commit the modification of the policy only if no
exception occured.
* testsuite/23_containers/unordered_set/max_load_factor/

I haven't plan to submit any patch for point 2. This small issue simply potentially add a useless long double operation so it's not a big deal. Just tell me if you want one. Ok to commit this one ?

Thanks to all that time spent on hashtable new questions came to me. Have you discuss with the committee about the meaning of max_load_factor on unordered_multiset or unordered_multimap ? If you have an unordered_multiset with 100 times the same value you will have with a max load factor of 1 about 100 buckets but only one will be in use. IMO load factor should be the number of _unique_ element / number of bucket.

Maybe this is something that could be handled by the profile mode. If we detect that all elements are in double in the container then the profile should advise to use a max_load_factor of 2 to limit the number of buckets.


And here is this one again:

2011-09-14 François Dumont <>

* include/bits/hashtable.h (_Hashtable<>::__rehash_policy(const
_RehashPolicy&)): Commit the modification of the policy only if no
exception occured.
* testsuite/23_containers/unordered_set/max_load_factor/

I also wondered if in __rehash_policy method we shouldn't rehash as soon as __n_bkt != _M_bucket_count rather than only when __n_bkt > _M_bucket_count. Users might change max load factor also to reduce the number of buckets...

Tested on linux x86_64.

Index: include/bits/hashtable.h
--- include/bits/hashtable.h	(revision 178861)
+++ include/bits/hashtable.h	(working copy)
@@ -741,10 +741,10 @@
 	       _H1, _H2, _Hash, _RehashPolicy, __chc, __cit, __uk>::
     __rehash_policy(const _RehashPolicy& __pol)
-      _M_rehash_policy = __pol;
       size_type __n_bkt = __pol._M_bkt_for_elements(_M_element_count);
       if (__n_bkt > _M_bucket_count)
-	_M_rehash(__n_bkt, __pol._M_next_resize);
+	_M_rehash(__n_bkt, _M_rehash_policy._M_next_resize);
+      _M_rehash_policy = __pol;
   template<typename _Key, typename _Value,
Index: testsuite/23_containers/unordered_set/max_load_factor/
--- testsuite/23_containers/unordered_set/max_load_factor/	(revision 0)
+++ testsuite/23_containers/unordered_set/max_load_factor/	(revision 0)
@@ -0,0 +1,78 @@
+// { dg-options "-std=gnu++0x" }
+// Copyright (C) 2011 Free Software Foundation, Inc.
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// GNU General Public License for more details.
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <>.
+#include <unordered_set>
+#include <limits>
+#include <ext/throw_allocator.h>
+#include <testsuite_hooks.h>
+void test01()
+  bool test __attribute__((unused)) = true;
+  typedef std::numeric_limits<std::size_t> nl_size_t;
+  std::unordered_set<int, std::hash<int>, std::equal_to<int>,
+		     __gnu_cxx::throw_allocator_limit<int> > us;
+  int val = 0;
+  for (; val != 100; ++val)
+    {
+      VERIFY( us.insert(val).second) ;
+      VERIFY( us.load_factor() <= us.max_load_factor() );
+    }
+  float cur_max_load_factor = us.max_load_factor();
+  int counter = 0;
+  std::size_t thrown_exceptions = 0;
+  while (true)
+    {
+      __gnu_cxx::limit_condition::set_limit(counter++);
+      bool do_break = false;
+      try
+	{
+	  us.max_load_factor(.5f);
+	  VERIFY( us.max_load_factor() == .5f );
+	  do_break = true;
+	}
+      catch (const __gnu_cxx::forced_error&)
+	{
+	  VERIFY( us.max_load_factor() == cur_max_load_factor );
+	  ++thrown_exceptions;
+	}
+      // Lets check that unordered_set will still be correctly resized
+      // when needed
+      __gnu_cxx::limit_condition::set_limit(nl_size_t::max());
+      for (;;)
+	{
+	  VERIFY( us.load_factor() <= us.max_load_factor() );
+	  size_t nbkts = us.bucket_count();
+	  VERIFY( us.insert(val++).second );
+	  if (us.bucket_count() != nbkts)
+	    break;
+	}
+      if (do_break)
+	break;
+    }
+  VERIFY( thrown_exceptions > 0 );
+int main()
+  test01();
+  return 0;

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]