This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: New power of 2 hash policy

From: Jonathan Wakely <jwakely at redhat dot com>
To: François Dumont <frs dot dumont at gmail dot com>
Cc: Michael Matz <matz at suse dot de>, "libstdc++ at gcc dot gnu dot org" <libstdc++ at gcc dot gnu dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Fri, 25 Sep 2015 14:28:15 +0100
Subject: Re: New power of 2 hash policy
Authentication-results: sourceware.org; auth=none
References: <55F1E20E dot 7080305 at gmail dot com> <alpine dot LSU dot 2 dot 20 dot 1509111508520 dot 30229 at wotan dot suse dot de> <20150911131815 dot GI2631 at redhat dot com> <20150911132316 dot GJ2631 at redhat dot com> <5604584D dot 5040904 at gmail dot com>

On 24/09/15 22:08 +0200, François Dumont wrote:

   Working on it I realised that despite the comment on _M_next_bkt
saying "no smaller than n" the method can return a value smaller for big
n values. This is not likely to happen but I prefer to take care of it.
I just make sure we won't try to rehash again even if the drawback is
that we won't respect max_load_factor anymore at those levels. But as I
said we will surely have a memory issue before that.

   Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index a9ad7dd..4e1bc29 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -457,6 +457,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  /// smallest prime that keeps the load factor small enough.
  struct _Prime_rehash_policy
  {
+    using __has_load_factor = std::true_type;
+
    _Prime_rehash_policy(float __z = 1.0) noexcept
    : _M_max_load_factor(__z), _M_next_resize(0) { }

@@ -501,6 +503,129 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    mutable std::size_t	_M_next_resize;
  };

+  /// Range hashing function considering that second args is a power of 2.


Does this mean "assuming" not "considering"?

+  struct _Mask_range_hashing
+  {
+    typedef std::size_t first_argument_type;
+    typedef std::size_t second_argument_type;
+    typedef std::size_t result_type;
+
+    result_type
+    operator()(first_argument_type __num,
+	       second_argument_type __den) const noexcept
+    { return __num & (__den - 1); }
+  };
+
+
+  /// Helper type to compute next power of 2.
+  template<std::size_t _N>
+    struct _NextPower2
+    {
+      static std::size_t
+      _Get(std::size_t __n)
+      {
+	std::size_t __next = _NextPower2<(_N >> 1)>::_Get(__n);
+	return __next |= __next >> _N;
+      }
+    };
+
+  template<>
+    struct _NextPower2<1>
+    {
+      static std::size_t
+      _Get(std::size_t __n)
+      { return __n |= __n >> 1; }
+    };


This doesn't seem to return the next power of 2, it returns one less.

_NextPower2<32>::_Get(2) returns 3, but 2 is already a power of 2.
_NextPower2<32>::_Get(3) returns 3, but the next power of 2 is 4.


I don't think this needs to be a recursive template, it can simply be
a function, can't it?

+  /// Rehash policy providing power of 2 bucket numbers. Ease modulo
+  /// operations.
+  struct _Power2_rehash_policy
+  {
+    using __has_load_factor = std::true_type;
+
+    _Power2_rehash_policy(float __z = 1.0) noexcept
+    : _M_max_load_factor(__z), _M_next_resize(0) { }
+
+    float
+    max_load_factor() const noexcept
+    { return _M_max_load_factor; }
+
+    // Return a bucket size no smaller than n (as long as n is not above the
+    // highest power of 2).


This says "no smaller than n" but it actually seems to guarantee
"greater than n" because _NextPower2<>::_Get(n)+1 is 2n when n is a
power of two.

+    std::size_t
+    _M_next_bkt(std::size_t __n) const
+    {
+      constexpr auto __max_bkt
+	= (std::size_t(1) << (sizeof(std::size_t) * 8 - 1));
+
+      std::size_t __res
+	= _NextPower2<((sizeof(std::size_t) * 8) >> 1)>::_Get(--__n) + 1;


You wouldn't need to add one to the result if the template actually
returned a power of two!

+      if (__res == 0)
+	__res = __max_bkt;
+
+      if (__res == __max_bkt)
+	// Set next resize to the max value so that we never try to rehash again
+	// as we already reach the biggest possible bucket number.
+	// Note that it might result in max_load_factor not being respected.
+	_M_next_resize = std::size_t(0) - 1;
+      else
+	_M_next_resize
+	  = __builtin_floor(__res * (long double)_M_max_load_factor);
+
+      return __res;
+    }


What are the requirements for this function, "no smaller than n" or
"greater than n"?

If "no smaller than n" is correct then the algorithm you want is
"round up to nearest power of 2", which you can find here (I wrote
this earlier this year for some reason I can't remember now):

https://gitlab.com/redistd/redistd/blob/master/include/redi/bits.h

The non-recursive version is only a valid constexpr function in C++14,
but since you don't need a constexpr function you could just that,
extended to handle 64-bit:

 std::size_t
 clp2(std::size_t n)
 {
   std::uint_least64_t x = n;
   // Algorithm from Hacker's Delight, Figure 3-3.
   x = x - 1;
   x = x | (x >> 1);
   x = x | (x >> 2);
   x = x | (x >> 4);
   x = x | (x >> 8);
   x = x | (x >>16);
   x = x | (x >>32);
   return x + 1;
 }

We could avoid the last shift when sizeof(size_t) == 32, I don't know
if the optimisers will take care of that anyway.


The rest of the patch looks fine.

Follow-Ups:
- Re: New power of 2 hash policy
  - From: FranÃois Dumont

References:
- New power of 2 hash policy
  - From: FranÃois Dumont
- Re: New power of 2 hash policy
  - From: Michael Matz
- Re: New power of 2 hash policy
  - From: Jonathan Wakely
- Re: New power of 2 hash policy
  - From: Jonathan Wakely
- Re: New power of 2 hash policy
  - From: FranÃois Dumont

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]