This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: New power of 2 hash policy


Is this one ok ?

François


On 28/09/2015 21:16, François Dumont wrote:
> On 25/09/2015 15:28, Jonathan Wakely wrote:
>> @@ -501,6 +503,129 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>>     mutable std::size_t    _M_next_resize;
>>>   };
>>>
>>> +  /// Range hashing function considering that second args is a power
>>> of 2.
>> Does this mean "assuming" not "considering"?
> I assume yes.
>
>>> +  struct _Mask_range_hashing
>>> +  {
>>> +    typedef std::size_t first_argument_type;
>>> +    typedef std::size_t second_argument_type;
>>> +    typedef std::size_t result_type;
>>> +
>>> +    result_type
>>> +    operator()(first_argument_type __num,
>>> +           second_argument_type __den) const noexcept
>>> +    { return __num & (__den - 1); }
>>> +  };
>>> +
>>> +
>>> +  /// Helper type to compute next power of 2.
>>> +  template<std::size_t _N>
>>> +    struct _NextPower2
>>> +    {
>>> +      static std::size_t
>>> +      _Get(std::size_t __n)
>>> +      {
>>> +    std::size_t __next = _NextPower2<(_N >> 1)>::_Get(__n);
>>> +    return __next |= __next >> _N;
>>> +      }
>>> +    };
>>> +
>>> +  template<>
>>> +    struct _NextPower2<1>
>>> +    {
>>> +      static std::size_t
>>> +      _Get(std::size_t __n)
>>> +      { return __n |= __n >> 1; }
>>> +    };
>> This doesn't seem to return the next power of 2, it returns one less.
>>
>> _NextPower2<32>::_Get(2) returns 3, but 2 is already a power of 2.
>> _NextPower2<32>::_Get(3) returns 3, but the next power of 2 is 4.
>
> Yes, name is bad, that is just part of the algo you copy/paste below. I
> review implementation to have _NextPower2 do all the algo.
>
>>
>> I don't think this needs to be a recursive template, it can simply be
>> a function, can't it?
> I wanted code to adapt to any sizeof(std::size_t) without relying on
> some preprocessor checks. As you pointed out additional >> 32 on 32 bits
> or >> 64 on 64 bits wouldn't hurt but the recursive template just make
> sure that we don't do useless operations.
>
>>
>>> +  /// Rehash policy providing power of 2 bucket numbers. Ease modulo
>>> +  /// operations.
>>> +  struct _Power2_rehash_policy
>>> +  {
>>> +    using __has_load_factor = std::true_type;
>>> +
>>> +    _Power2_rehash_policy(float __z = 1.0) noexcept
>>> +    : _M_max_load_factor(__z), _M_next_resize(0) { }
>>> +
>>> +    float
>>> +    max_load_factor() const noexcept
>>> +    { return _M_max_load_factor; }
>>> +
>>> +    // Return a bucket size no smaller than n (as long as n is not
>>> above the
>>> +    // highest power of 2).
>> This says "no smaller than n" but it actually seems to guarantee
>> "greater than n" because _NextPower2<>::_Get(n)+1 is 2n when n is a
>> power of two.
> yes but this function is calling _NextPower2<>::_Get(n - 1) + 1, there
> is a minus one which make this comment valid as shown by newly
> introduced test.
>
>>> +    std::size_t
>>> +    _M_next_bkt(std::size_t __n) const
>>> +    {
>>> +      constexpr auto __max_bkt
>>> +    = (std::size_t(1) << (sizeof(std::size_t) * 8 - 1));
>>> +
>>> +      std::size_t __res
>>> +    = _NextPower2<((sizeof(std::size_t) * 8) >> 1)>::_Get(--__n) + 1;
>> You wouldn't need to add one to the result if the template actually
>> returned a power of two!
>>
>>> +      if (__res == 0)
>>> +    __res = __max_bkt;
>>> +
>>> +      if (__res == __max_bkt)
>>> +    // Set next resize to the max value so that we never try to
>>> rehash again
>>> +    // as we already reach the biggest possible bucket number.
>>> +    // Note that it might result in max_load_factor not being
>>> respected.
>>> +    _M_next_resize = std::size_t(0) - 1;
>>> +      else
>>> +    _M_next_resize
>>> +      = __builtin_floor(__res * (long double)_M_max_load_factor);
>>> +
>>> +      return __res;
>>> +    }
>> What are the requirements for this function, "no smaller than n" or
>> "greater than n"?
> 'No smaller than n' like stated in the comment. However for big n it is
> not possible, even in the prime number based implementation. So I played
> with _M_next_resize to make sure that _M_next_bkt won't be called again
> as soon as the max bucket number has been reach.
>
>
>> If "no smaller than n" is correct then the algorithm you want is
>> "round up to nearest power of 2", which you can find here (I wrote
>> this earlier this year for some reason I can't remember now):
>>
>> https://gitlab.com/redistd/redistd/blob/master/include/redi/bits.h
>>
>> The non-recursive version is only a valid constexpr function in C++14,
>> but since you don't need a constexpr function you could just that,
>> extended to handle 64-bit:
>>
>>  std::size_t
>>  clp2(std::size_t n)
>>  {
>>    std::uint_least64_t x = n;
>>    // Algorithm from Hacker's Delight, Figure 3-3.
>>    x = x - 1;
>>    x = x | (x >> 1);
>>    x = x | (x >> 2);
>>    x = x | (x >> 4);
>>    x = x | (x >> 8);
>>    x = x | (x >>16);
>>    x = x | (x >>32);
>>    return x + 1;
>>  }
>>
>> We could avoid the last shift when sizeof(size_t) == 32, I don't know
>> if the optimisers will take care of that anyway.
> This is indeed the algo I found by myself and that I adapted to work
> with any sizeof(size_t).
>
> Do you prefer the new version or do you want to stick a more explicit
> version like the one you propose above. In this case is the last 32 bits
> shift enough ? No 128 bits platform yet ?
>
> François
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]