Bug 41975 - [C++0x] [DR579] unordered_set::erase performs worse when nearly empty
Summary: [C++0x] [DR579] unordered_set::erase performs worse when nearly empty
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.4.2
: P3 normal
Target Milestone: 4.7.0
Assignee: Paolo Carlini
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-06 20:53 UTC by Shaun Jackman
Modified: 2012-07-24 21:51 UTC (History)
4 users (show)

See Also:
Host: i686-pc-linux-gnu
Target: i686-pc-linux-gnu
Build: i686-pc-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2009-11-29 08:34:03


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Shaun Jackman 2009-11-06 20:53:45 UTC
unordered_set::erase returns an iterator to the next element in the hash
table. When the hash table is empty, this means checking every single
bucket. For a hash table that used to have a lot of elements (say one
million) which were all removed and now has only a few elements (say
two), erasing an element is very slow. I'm not using the iterator
returned by erase. Is there a way to avoid this situation? I'm not very
keen on checking the load_factor and resizing the number buckets.

Thanks,
Shaun
Comment 1 Paolo Carlini 2009-11-06 21:35:52 UTC
Interesting question, but I'm afraid this is very hard to improve: note how in the specs themselves - Table 98 in N2960 - erase(iterator) has complexity: "Average case O(1), worst case O(a.size())."
Comment 2 Shaun Jackman 2009-11-06 23:44:04 UTC
We're seeing O(a.bucket_count()), which is quite different than O(a.size()), particularly when the hash table is empty. I understand why it's hard to improve, but I think the matter needs to brought up with the standards committee.
Comment 3 Paolo Carlini 2009-11-06 23:50:20 UTC
You are right that bucket_size != size. Now, can you imagine a way to fix this? Otherwise, I agree, we should raise the issue, do you want to do that, maybe on the library reflector?
Comment 4 Shaun Jackman 2009-11-07 00:07:15 UTC
If the pointer of the last element in the bucket pointed to the first element of the next non-empty bucket, this particular issue would be avoided. Although, I'm sure it would create its own issues.

I'm not familiar with the library reflector. If you have the time to follow up with this issue there, I would appreciate it.
Comment 5 Paolo Carlini 2009-11-07 01:17:37 UTC
This is certainly related, and has been, well I would say, dismissed ìn Bellevue (I wasn't there, unfortunately, can try to find the minutes)

   http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#579

N2023 seems rather well written to me...
Comment 6 John Zwinck 2009-11-28 22:59:33 UTC
The same bug has recently been raised in Boost's implementation of unordered containers: https://svn.boost.org/trac/boost/ticket/3693
Comment 7 Richard Biener 2009-11-28 23:16:13 UTC
An implementation is probably expected to shrink bucket_count when size
shrinks, so the complexity should still be O(size).  That would be good
for memory use anyway, so why's that not done?
Comment 8 Shaun Jackman 2009-11-29 00:44:23 UTC
I wouldn't depend on the number of buckets shrinking. Shrinking (and growing) a hash table is an expensive operation that requires rehashing all the elements in the hash table. Even if the implementation does provide the ability to shrink the hash table, a number of applications would disable it. Google sparsehash provides min_load_factor for this purpose.
Comment 9 Stefan Straßer 2009-11-29 02:29:40 UTC
(In reply to comment #7)
> An implementation is probably expected to shrink bucket_count when size
> shrinks, so the complexity should still be O(size).  That would be good
> for memory use anyway, so why's that not done?
> 

shrinking invalidates iterators, doesn't it?
erase() is specified not to invalidate iterators.
Comment 10 Paolo Carlini 2009-11-29 08:34:02 UTC
Stefan is right. The issue, in full generality, isn't trivial at all, there is now a new discussion on the library reflector. I'm under the impression that for C++0x we are not going to standardize the minimum load factor suggested by Matt, seems much more likely that erase will be just changed to return void, there is a growing consensus about that.
Comment 11 Paolo Carlini 2009-12-22 10:02:09 UTC
Relevant DR reopened:

  http://home.roadrunner.com/~hinnant/issue_review/lwg-active.html#579

Next, we should provide detailed wording...
Comment 12 Paolo Carlini 2010-02-09 16:09:11 UTC
Looks like there is a strong consensus in the LWG for the proposed resolution, that is returning void, and LWG 579 now is [Tentatively Ready]. We could even implement it in time for 4.5.0, but, if I'm not mistaken, the iterator range case is not trivial, let's reconsider that in a few days...
Comment 13 paolo@gcc.gnu.org 2010-02-11 18:11:16 UTC
Subject: Bug 41975

Author: paolo
Date: Thu Feb 11 18:11:01 2010
New Revision: 156705

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=156705
Log:
2010-02-11  Paolo Carlini  <paolo.carlini@oracle.com>

	PR libstdc++/41975, DR 579
	* include/bits/hashtable.h (_Hashtable<>::_M_erase_node): Remove.
	(erase(const_iterator), erase(const_iterator, const_iterator)):
	Change return type to void.
	* include/debug/unordered_map: Adjust.
	* include/debug/unordered_set: Likewise.
	* testsuite/util/exception/safety.h: Likewise.
	* testsuite/23_containers/unordered_map/erase/1.cc: Likewise.
	* testsuite/23_containers/unordered_map/erase/24061-map.cc: Likewise.
	* testsuite/23_containers/unordered_set/erase/1.cc:  Likewise.
	* testsuite/23_containers/unordered_set/erase/24061-map.cc: Likewise.
	* testsuite/23_containers/unordered_multimap/erase/1.cc:  Likewise.
	* testsuite/23_containers/unordered_multimap/erase/24061-map.cc:
	Likewise.
	* testsuite/23_containers/unordered_multiset/erase/1.cc:  Likewise.
	* testsuite/23_containers/unordered_multiset/erase/24061-map.cc:
	Likewise.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/hashtable.h
    trunk/libstdc++-v3/include/debug/unordered_map
    trunk/libstdc++-v3/include/debug/unordered_set
    trunk/libstdc++-v3/testsuite/23_containers/unordered_map/erase/1.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_map/erase/24061-map.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multimap/erase/1.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multimap/erase/24061-multimap.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/erase/1.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/erase/24061-multiset.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_set/erase/1.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_set/erase/24061-set.cc
    trunk/libstdc++-v3/testsuite/util/exception/safety.h

Comment 14 Paolo Carlini 2010-02-11 18:13:01 UTC
Done.
Comment 15 paolo@gcc.gnu.org 2010-03-09 01:56:57 UTC
Subject: Bug 41975

Author: paolo
Date: Tue Mar  9 01:56:42 2010
New Revision: 157300

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=157300
Log:
2010-03-08  Paolo Carlini  <paolo.carlini@oracle.com>

	Revert:
	2010-02-11  Paolo Carlini  <paolo.carlini@oracle.com>

	PR libstdc++/41975, DR 579
	* include/bits/hashtable.h (_Hashtable<>::_M_erase_node): Remove.
	(erase(const_iterator), erase(const_iterator, const_iterator)):
	Change return type to void.
	* include/debug/unordered_map: Adjust.
	* include/debug/unordered_set: Likewise.
	* testsuite/util/exception/safety.h: Likewise.
	* testsuite/23_containers/unordered_map/erase/1.cc: Likewise.
	* testsuite/23_containers/unordered_map/erase/24061-map.cc: Likewise.
	* testsuite/23_containers/unordered_set/erase/1.cc:  Likewise.
	* testsuite/23_containers/unordered_set/erase/24061-map.cc: Likewise.
	* testsuite/23_containers/unordered_multimap/erase/1.cc:  Likewise.
	* testsuite/23_containers/unordered_multimap/erase/24061-map.cc:
	Likewise.
	* testsuite/23_containers/unordered_multiset/erase/1.cc:  Likewise.
	* testsuite/23_containers/unordered_multiset/erase/24061-map.cc:
	Likewise.


Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/hashtable.h
    trunk/libstdc++-v3/include/debug/unordered_map
    trunk/libstdc++-v3/include/debug/unordered_set
    trunk/libstdc++-v3/testsuite/23_containers/unordered_map/erase/1.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_map/erase/24061-map.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multimap/erase/1.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multimap/erase/24061-multimap.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/erase/1.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/erase/24061-multiset.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_set/erase/1.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_set/erase/24061-set.cc
    trunk/libstdc++-v3/testsuite/util/exception/safety.h

Comment 16 Paolo Carlini 2010-03-09 01:59:35 UTC
I reverted my changes and re-opened the PR: DR 579 is being resolved as NAD, because there is evidence (eg, the Dinkumware implementation) that returning an iterator doesn't necessarily impact performance.
Comment 17 John Zwinck 2010-03-09 18:40:42 UTC
(In reply to comment #16)
> there is evidence (eg, the Dinkumware implementation) that returning an
> iterator doesn't necessarily impact performance.

The GCC implementation does have poor performance.  Why not leave the (performance-improving) patch in place until someone actually comes up with an implementation for GCC which does perform well?
As it stands, users are left in a strange situation, being told that the performance (in GCC) is poor, but that it has to be this way because of Dinkumware (which, it's claimed, is fast).  Doesn't this only serve to drive users away from GCC's implementation?

To be clear, I am very much in favor of any solution which provides approximately optimal performance.  Boost, for example, chose to implement a new method called "erase_return_void" for users who care about performance of this operation--a workaround, but with emphasis on "work."
Comment 18 Paolo Carlini 2010-03-09 18:49:06 UTC
Because would be non-conforming. In case my previous message was not clear enough, in C++1x erase will return *iterator*.
Comment 19 Shaun Jackman 2010-03-09 19:15:49 UTC
How does the Dinkumware implementation avoid the performance hit of erase returning an iterator?

Comment 20 Paolo Carlini 2010-03-09 19:19:12 UTC
Nobody knows the gory details, but Plauger said that people can look into the headers of the last beta of Visual Studio... Knowledgeable people are under the impression they are using a different, double linked, implementation for the hash table, which has its own problem, still, at this point it's pretty clear that in C++1x erase will return an iterator and we have to live with that.
Comment 21 John Zwinck 2010-03-09 19:55:57 UTC
(In reply to comment #18)
> Because would be non-conforming. In case my previous message was not clear
> enough, in C++1x erase will return *iterator*.
The Boost approach is still an option for GCC: let the standards mandate a suboptimal interface if they must, but provide an alternative (the one Boost calls "erase_return_void").  From where I sit at least, it doesn't seem infeasible for GCC to go a little bit beyond the standard in this case (again, as Boost have already elected to do).
Comment 22 Paolo Carlini 2010-03-09 20:15:27 UTC
As a last resort, we can imagine doing that. Would be the first case, however, to my best knowledge, that we provide a different interface controlled by a macro this way. Much better would be, and has also been mentioned today, providing two different implementations, both returning iterator and only different as regards the internal details.
Comment 23 Paolo Carlini 2010-03-10 16:36:15 UTC
At the end of a further discussion today, apparently there is a strong consensus to add additional signatures returning void, "Boost-style".
Comment 24 Richard Biener 2010-04-06 11:20:41 UTC
GCC 4.5.0 is being released.  Deferring to 4.5.1.
Comment 25 Richard Biener 2010-07-31 09:29:39 UTC
GCC 4.5.1 is being released, adjusting target milestone.
Comment 26 Paolo Carlini 2010-08-08 15:03:37 UTC
US 113, ES 2, US 118 / Issue 579 have been closed as NAD, thus let's figure out how best obtain O(1) in our implementation...
Comment 27 Paolo Carlini 2010-09-20 17:23:25 UTC
Unless somebody posts here over the next two/three days or so *concrete* ideas of a different sort, I'm going to simply work on a doubly linked list solution, along the lines of the section "iterator" here: http://www.drdobbs.com/184403822 Nothing new, therefore. All the operations on iterators will become faster, not just computing the iterator returned by erase, on the other hand two pointers instead of one will be used for each element.
Comment 28 Joaquín M López Muñoz 2010-09-20 17:34:02 UTC
> US 113, ES 2, US 118 / Issue 579 have been closed as NAD, thus
> let's figure out how best obtain O(1) in our implementation...

Do you have a rationale for the closing of this NB comments?
N3133 shows 579 unchanged. I was told that someone reported about
the existence of a O(1) singly-linked list implementation in
Rapperswil, but I don't have additional details.

I'd wait to get the full picture before going to a doubly-linked
list: the commitee had full info on the issue, so if they closed
579 as NAD they are supposed to be able to provide a rationale.
Comment 29 Paolo Carlini 2010-09-20 17:41:30 UTC
I'm not aware of any singly linked list implementation, to be honest. I know that Dinkumware already uses doubly, and, if I'm not wrong, Howard just moved to it. I'll send you privately the rationale I have from the minutes, I'm also asking again Matt whether he has anything else to suggest, but frankly I'm rather fed up with this issue, I mean to implement something that *works*, is *conforming* and then test it in the field.
Comment 30 Paolo Carlini 2010-09-21 17:55:25 UTC
More correctly (in the meanwhile went through a exchange at the beginning of this year), Howard stores the hash, which boils down to a memory requirement similar to that of the traditional doubly linked list scheme per Dinkum in the limit of high load factor, for small load factor is better because can use only one pointer instead of two for each bucket in the bucket list. All in all, if the requirements of throwing hash + erase complexity are combined, I don't think anything with a memory use similar to that of the singly linked schemes is possible.

Joaquin, I think Matt would be in favor of a motion asking a re-opening of the issue in Batavia, what do you think?
Comment 31 Joaquín M López Muñoz 2010-10-06 11:10:12 UTC
Paolo,

I've read the minutes and seems no strong consensus
was reached. I think it'd be useful if the issue can
be reopened, at least for informative purposes, so
that the committe specify whether it's in the
spirit of the standard to allow low memory
(singly-linked, one word per node) implementations,
and if so what the likely implementations are (I
understand this is Pablo's third approach). If this
is the case the FCD has to be changed so as to allow
erase() to throw. Otherwise implementors will know
we have to resort to doubly-linked solutions.
Comment 32 Richard Biener 2010-12-16 13:03:33 UTC
GCC 4.5.2 is being released, adjusting target milestone.
Comment 33 François Dumont 2011-11-23 20:30:25 UTC
Author: fdumont
Date: Wed Nov 23 20:30:18 2011
New Revision: 181677

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181677
Log:
2011-11-23  François Dumont <fdumont@gcc.gnu.org>

	PR libstdc++/41975
	* include/bits/hashtable.h (_Hashtable<>): Major data model
	modification to limit performance impact of empty buckets in
	erase(iterator) implementation.
	* include/bits/hashtable_policy.h (_Hashtable_iterator,
	_Hashtable_const_iterator): Remove not used anymore.
	* include/bits/hashtable_policy.h (_Prime_rehash_policy): Remove
	_M_grow_factor, just use natural evolution of prime numbers. Add
	_M_prev_size to know when the number of buckets can be reduced.
	* include/bits/unordered_set.h (__unordered_set<>,
	__unordered_multiset<>), unordered_map.h (__unordered_map<>,
	__unordered_multimap<>): Change default value of cache hash code
	template parameter, false for integral types with noexcept hash
	functor, true otherwise.
	* include/debug/unordered_map, unordered_set: Adapt transformation
	from iterator/const_iterator to respectively
	local_iterator/const_local_iterator.
	* testsuite/performance/23_containers/copy_construct/unordered_set.cc:
	New.
	* testsuite/23_containers/unordered_set/instantiation_neg.cc: New.
	* testsuite/23_containers/unordered_set/hash_policy/rehash.cc: New.
	* testsuite/23_containers/unordered_multiset/cons/copy.cc: New.
	* testsuite/23_containers/unordered_multiset/erase/1.cc,
	24061-multiset.cc: Add checks on the number of bucket elements.
	* testsuite/23_containers/unordered_multiset/insert/multiset_range.cc,
	multiset_single.cc, multiset_single_move.cc: Likewise.


Added:
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/cons/copy.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_set/hash_policy/rehash.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_set/instantiation_neg.cc
    trunk/libstdc++-v3/testsuite/performance/23_containers/copy_construct/unordered_set.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/hashtable.h
    trunk/libstdc++-v3/include/bits/hashtable_policy.h
    trunk/libstdc++-v3/include/bits/unordered_map.h
    trunk/libstdc++-v3/include/bits/unordered_set.h
    trunk/libstdc++-v3/include/debug/unordered_map
    trunk/libstdc++-v3/include/debug/unordered_set
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/erase/1.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/erase/24061-multiset.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/insert/multiset_range.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/insert/multiset_single.cc
    trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/insert/multiset_single_move.cc
Comment 34 Paolo Carlini 2011-11-28 22:10:00 UTC
Fixed for 4.7.0.
Comment 35 Andrew Gallagher 2012-07-24 21:45:55 UTC
It appears that http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181677 has caused some performance regression in gcc-4.7 (or I am missing something).  The following example runs 3 times faster with gcc-4.6 than with gcc-4.7:

#include <unordered_map>

using namespace std;

int main() {
    unordered_map<int, int> m;
    //m.rehash(10000000);
    for (int i = 0; i < 10000000; i++) {
        m[i] = i;
    }
}

Looking at a profile generated by google-perftools, it seems that most of the time is spent in _M_rehash with gcc-4.7, as the newly rehashed size is considerably smaller than in gcc-4.6 (perhaps due to the removal of _M_growth_factor?), therefore it gets called a lot more often.  Is this expected behavior?

Reserve/rehash also appears to be broken.  If the 'rehash' line is uncommented in the example above, then a subsequent insert hits the check against _M_prev_resize in _M_need_rehash and rehashes the bucket size back to a small value.
Comment 36 Jonathan Wakely 2012-07-24 21:51:23 UTC
See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075