This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2


Here is the proposal to remove shrinking feature from hash policy. I have also considered your remark regarding usage of lower_bound so _M_bkt_for_elements doesn't call _M_next_bkt (calling lower_bound) anymore. For 2 of the 3 calls it was only a source of redundant lower_bound invocations, in the last case I call _M_next_bkt explicitly.

2012-11-13 François Dumont <fdumont@gcc.gnu.org>

    * include/bits/hashtable_policy.h (_Prime_rehash_policy): Remove
    automatic shrink.
    (_Prime_rehash_policy::_M_bkt_for_elements): Do not call
    _M_next_bkt anymore.
    (_Prime_rehash_policy::_M_next_bkt): Move usage of
    _S_growth_factor ...
    (_Prime_rehash_policy::_M_need_rehash): ... here.
    * include/bits/hashtable.h (_Hashtable<>): Adapt.

Tested under linux x86_64, normal and debug modes.

Regarding performance, I have done a small evolution of the 54075.cc test proposed last time. It is now checking performance with and without cache of hash code. Result is:

54075.cc std::unordered_set 300000 Foo insertions without cache 9r 9u 0s 13765616mem 0pf
54075.cc std::unordered_set 300000 Foo insertions with cache 14r 13u 0s 18562064mem 0pf
54075.cc std::tr1::unordered_set 300000 Foo insertions without cache 9r 8u 1s 13765616mem 0pf
54075.cc std::tr1::unordered_set 300000 Foo insertions with cache 14r 13u 0s 18561952mem 0pf


So the difference of performance in this case only seems to come from caching the hash code or not. In reported use case default behavior of std::unordered_set is to cache hash codes and std::tr1::unordered_set not to cache it. We should perhaps review default behavior regarding caching the hash code. Perhaps cache it if the hash functor can throw and not cache it otherwise, not easy to find out what's best to do.

François


On 11/09/2012 11:50 AM, Paolo Carlini wrote:
Hi again,

+ // To get previous bound we use _S_growth * 2 to avoid ocillations in the
+ // number of buckets when looping on insertion/removal of elements.
const unsigned long* __prev_bkt
- = std::lower_bound(__prime_list + 1, __next_bkt, __n / _S_growth_factor);
+ = std::lower_bound(__prime_list + 1, __next_bkt,
+ __n / _S_growth_factor / _S_growth_factor);


Looks like, here you are dividing by _S_Growth ^ 2? Is it intended? But anyway, in my opinion the very my _M_prev_resize idea (thus rehash shrinking, right?) is proving quite fragile and we also got negative comments from the users about shrinking, which is new. Before pursuing it further, I think we should double check what the other implementations do, are they also shrinking? Because otherwise we could, at least for the time being, remove the related bits and save ourselves many headaches...

Paolo.


Index: include/bits/hashtable.h
===================================================================
--- include/bits/hashtable.h	(revision 193484)
+++ include/bits/hashtable.h	(working copy)
@@ -806,11 +806,6 @@
       _M_rehash_policy()
     {
       _M_bucket_count = _M_rehash_policy._M_next_bkt(__bucket_hint);
-
-      // We don't want the rehash policy to ask for the hashtable to
-      // shrink on the first insertion so we need to reset its
-      // previous resize level.
-      _M_rehash_policy._M_prev_resize = 0;
       _M_buckets = _M_allocate_buckets(_M_bucket_count);
     }
 
@@ -834,16 +829,12 @@
 	_M_element_count(0),
 	_M_rehash_policy()
       {
+	auto __nb_elems = __detail::__distance_fw(__f, __l);
 	_M_bucket_count =
-	  _M_rehash_policy._M_bkt_for_elements(__detail::__distance_fw(__f,
-								       __l));
-	if (_M_bucket_count <= __bucket_hint)
-	  _M_bucket_count = _M_rehash_policy._M_next_bkt(__bucket_hint);
+	  _M_rehash_policy._M_next_bkt(
+	    std::max(_M_rehash_policy._M_bkt_for_elements(__nb_elems),
+		     __bucket_hint));
 
-	// We don't want the rehash policy to ask for the hashtable to
-	// shrink on the first insertion so we need to reset its
-	// previous resize level.
-	_M_rehash_policy._M_prev_resize = 0;
 	_M_buckets = _M_allocate_buckets(_M_bucket_count);
 	__try
 	  {
@@ -990,6 +981,7 @@
     __rehash_policy(const _RehashPolicy& __pol)
     {
       size_type __n_bkt = __pol._M_bkt_for_elements(_M_element_count);
+      __n_bkt = __pol._M_next_bkt(__n_bkt);
       if (__n_bkt != _M_bucket_count)
 	_M_rehash(__n_bkt, _M_rehash_policy._M_state());
       _M_rehash_policy = __pol;
@@ -1641,19 +1633,12 @@
     {
       const __rehash_state& __saved_state = _M_rehash_policy._M_state();
       std::size_t __buckets
-	= _M_rehash_policy._M_bkt_for_elements(_M_element_count + 1);
-      if (__buckets <= __n)
-	__buckets = _M_rehash_policy._M_next_bkt(__n);
+	= std::max(_M_rehash_policy._M_bkt_for_elements(_M_element_count + 1),
+		   __n);
+      __buckets = _M_rehash_policy._M_next_bkt(__buckets);
 
       if (__buckets != _M_bucket_count)
-	{
-	  _M_rehash(__buckets, __saved_state);
-
-	  // We don't want the rehash policy to ask for the hashtable to shrink
-	  // on the next insertion so we need to reset its previous resize
-	  // level.
-	  _M_rehash_policy._M_prev_resize = 0;
-	}
+	_M_rehash(__buckets, __saved_state);
       else
 	// No rehash, restore previous state to keep a consistent state.
 	_M_rehash_policy._M_reset(__saved_state);
Index: include/bits/hashtable_policy.h
===================================================================
--- include/bits/hashtable_policy.h	(revision 193484)
+++ include/bits/hashtable_policy.h	(working copy)
@@ -358,7 +358,7 @@
   struct _Prime_rehash_policy
   {
     _Prime_rehash_policy(float __z = 1.0)
-    : _M_max_load_factor(__z), _M_prev_resize(0), _M_next_resize(0) { }
+    : _M_max_load_factor(__z), _M_next_resize(0) { }
 
     float
     max_load_factor() const noexcept
@@ -380,25 +380,21 @@
     _M_need_rehash(std::size_t __n_bkt, std::size_t __n_elt,
 		   std::size_t __n_ins) const;
 
-    typedef std::pair<std::size_t, std::size_t> _State;
+    typedef std::size_t _State;
 
     _State
     _M_state() const
-    { return std::make_pair(_M_prev_resize, _M_next_resize); }
+    { return _M_next_resize; }
 
     void
-    _M_reset(const _State& __state)
-    {
-      _M_prev_resize = __state.first;
-      _M_next_resize = __state.second;
-    }
+    _M_reset(_State __state)
+    { _M_next_resize = __state; }
 
     enum { _S_n_primes = sizeof(unsigned long) != 8 ? 256 : 256 + 48 };
 
     static const std::size_t _S_growth_factor = 2;
 
     float                _M_max_load_factor;
-    mutable std::size_t  _M_prev_resize;
     mutable std::size_t  _M_next_resize;
   };
 
@@ -417,35 +413,28 @@
     static const unsigned char __fast_bkt[12]
       = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11 };
 
-    const std::size_t __grown_n = __n * _S_growth_factor;
-    if (__grown_n <= 11)
+    if (__n <= 11)
       {
-	_M_prev_resize = 0;
 	_M_next_resize
-	  = __builtin_ceil(__fast_bkt[__grown_n]
+	  = __builtin_ceil(__fast_bkt[__n]
 			   * (long double)_M_max_load_factor);
-	return __fast_bkt[__grown_n];
+	return __fast_bkt[__n];
       }
 
     const unsigned long* __next_bkt
       = std::lower_bound(__prime_list + 5, __prime_list + _S_n_primes,
-			 __grown_n);
-    const unsigned long* __prev_bkt
-      = std::lower_bound(__prime_list + 1, __next_bkt, __n / _S_growth_factor);
-
-    _M_prev_resize
-      = __builtin_floor(*(__prev_bkt - 1) * (long double)_M_max_load_factor);
+			 __n);
     _M_next_resize
       = __builtin_ceil(*__next_bkt * (long double)_M_max_load_factor);
     return *__next_bkt;
   }
 
-  // Return the smallest prime p such that alpha p >= n, where alpha
+  // Return the smallest integer p such that alpha p >= n, where alpha
   // is the load factor.
   inline std::size_t
   _Prime_rehash_policy::
   _M_bkt_for_elements(std::size_t __n) const
-  { return _M_next_bkt(__builtin_ceil(__n / (long double)_M_max_load_factor)); }
+  { return __builtin_ceil(__n / (long double)_M_max_load_factor); }
 
   // Finds the smallest prime p such that alpha p > __n_elt + __n_ins.
   // If p > __n_bkt, return make_pair(true, p); otherwise return
@@ -467,7 +456,8 @@
 				 / (long double)_M_max_load_factor;
 	if (__min_bkts >= __n_bkt)
 	  return std::make_pair(true,
-				_M_next_bkt(__builtin_floor(__min_bkts) + 1));
+	    _M_next_bkt(std::max<std::size_t>(__builtin_floor(__min_bkts) + 1,
+					      __n_bkt * _S_growth_factor)));
 	else
 	  {
 	    _M_next_resize
@@ -475,13 +465,6 @@
 	    return std::make_pair(false, 0);
 	  }
       }
-    else if (__n_elt + __n_ins < _M_prev_resize)
-      {
-	long double __min_bkts = (__n_elt + __n_ins)
-				 / (long double)_M_max_load_factor;
-	return std::make_pair(true,
-			      _M_next_bkt(__builtin_floor(__min_bkts) + 1));
-      }
     else
       return std::make_pair(false, 0);
   }
// Copyright (C) 2012 Free Software Foundation, Inc.
//
// This file is part of the GNU ISO C++ Library.  This library is free
// software; you can redistribute it and/or modify it under the
// terms of the GNU General Public License as published by the
// Free Software Foundation; either version 3, or (at your option)
// any later version.

// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.

// You should have received a copy of the GNU General Public License along
// with this library; see the file COPYING3.  If not see
// <http://www.gnu.org/licenses/>.

// { dg-options "-std=c++11" }

#include <testsuite_performance.h>
#include <sstream>
#ifdef _USE_TR1
#  include <tr1/unordered_set>
const char* ns = "std::tr1";
#else
#  include<unordered_set>
const char* ns = "std";
#endif

struct Foo
{
  int bar;
  int baz;
  int meh;

  Foo()
  { bar = random(); baz = random(); meh = random(); }

  size_t
  hash() const noexcept
  { return bar ^ baz ^ meh; }

  inline bool
  operator==(const Foo& other) const
  { return other.bar == bar && other.baz == baz && other.meh == meh; }
};

struct HashFunction
{
  template<typename T>
    size_t operator()(const T& t) const noexcept
    { return t.hash(); }
};

template<bool use_cache>
  void bench()
  {
    using namespace __gnu_test;

    time_counter time;
    resource_counter resource;

    const int sz = 300000;

    Foo foos[sz];
#ifdef _USE_TR1
      std::tr1::__unordered_set<Foo, HashFunction,
				std::equal_to<Foo>,
				std::allocator<Foo>,
				use_cache> s;
#else
      std::__uset_hashtable<Foo, HashFunction,
			    std::equal_to<Foo>,
			    std::allocator<Foo>,
			    std::__uset_traits<use_cache>> s;
#endif

    start_counters(time, resource);

    for (int i = 0; i != sz ; ++i)
      s.insert(foos[i]);

    stop_counters(time, resource);
    std::ostringstream ostr;
    ostr << ns << "::unordered_set " << sz << " Foo insertions "
	 << (use_cache ? "with" : "without") << " cache";
    report_performance(__FILE__, ostr.str().c_str(), time, resource);
  }

int main()
{
  bench<false>();
  bench<true>();
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]