The following piece of code shows 3x performance regression from 4.6.2 to 4.7.1. The reason should come from the change in libstdc++. The code is compiled with -O2 -std=c++0x. -O3 doesn't make much difference; with or without the call to reserve doesn't make much difference. The attachment contains profiling using google-perftool. Looks like the major cost comes from the rehashing. Does anyone aware of the issue, or I am the first one to report this? #include <unordered_map> using namespace std; int main() { const int N = 10000000; unordered_map<int, int> m; m.reserve(2 * N); for (int i = 0; i < N; i++) { m[i] = i; } } Timing: [hidden]$ time ./a-4.6.2.out real 0m1.029s user 0m0.787s sys 0m0.239s [hidden]$ time ./a-4.7.1.out real 0m3.364s user 0m2.596s sys 0m0.761s
Created attachment 27861 [details] Profiling of gcc-4.7.1 using google-perftools
Created attachment 27862 [details] Profiling of gcc-4.6.2 using google-perftools
Francois, can you have a look? Thanks!
I wonder, anyway, if the apparent slow down is just an artifact caused by a different handling of the load factor in the reworked unordered containers which we have been delivering since 4.7.0. I would suggest submitter to experiment a bit with max_load_factor, and see if when 4.6.x seems faster at insertion time actually the load factor is higher (too searching would be slower).
@Paolo Carlini: can you talk more about how to experiment with max_load_factor? As long as I use the same max_load_factor for 4.6 and 4.7, I can still see the performance difference (3x slower is the best result): max_load_factor gcc-4.6.2 gcc-4.7.1 0.2 1.600s 7.650s 0.5 1.125s 4.251s 1.0 1.004s 3.378s 2.0 0.914s 20.471s 5.0 0.917s 29.132s
In some cases 4.6.x was handling max_load_factor incorrectly. Thus, the idea isn't comparing 4.6.x to 4.7.x with the same max_load_factor (I don't think it's useful), instead, actually measure load_factor(s), see if for some values of max_load_factor, the actual load_factor(s) are different in 4.6.x vs 4.7.x. In any case, measure the actual load_factor while the map grows.
@Paolo Carlini: the problem is, with different max_load_factor in range [0.2-5], the *best* result of 4.7.1 is still 2x slower than the *worst* of 4.6.2. I printed out the average load factor during the insertion. Looks like 4.7.1's load factor is very close to the max_load_factor, and the average for 4.6.2 is about 1/4 of the max_load_factor. But what conclusion can we get from this result?
I just stumbled over this bug while searching for something related to the max load factor (it seems that since 4.7 the hashtable also shrinks when you set the load factor higher, which is at least for me unfortunate). I did just change the testcase to count the number of rehashes (by checking bucket count before and after insert) and found: gcc4.6 without reserve: 21 gcc4.6 with reserve: 1 gcc4.7 without reserve: 155 gcc4.7 with reserve: 160 Then in callgrind I had a look and most time seems to be spend in rehashing. So I would assume that when the 4.7 version gets the number of rehashing down to where it was, then we also get the speed down to where it was. I would say that the reserve behaviour though is definetly broken, as it even increases the amount of rehashings. I personally would just not have the hashtable ever shrink on insert and/or load factor setting, just only ever on explicit rehash() calls...
I confirm that the reserve method is broken. I had correctly handle the size hint that can be given through the hashtable constructor, I set _M_rehash_policy._M_prev_resize to 0 just to avoid the container to be shrink until it reaches the targetted size. The same thing should be done in the rehash method that is called from reserve one. I will submit a patch soon. Regarding the shrink on insert calls, well, it can be easily avoided by calling (normally) reserve so I considered that it is better to offer a container with a symetric behavior. I will also I think reconsider the grow factor. During one of my propositions of hashtable redesign the number of empty buckets was a problem for performance. With the current implementation it is not a problem anymore so we could grow a little bit faster.
A patch is available here: http://gcc.gnu.org/ml/libstdc++/2012-07/msg00051.html Submitter and interested people can give it a try before it goes in.
Author: fdumont Date: Wed Jul 25 19:32:48 2012 New Revision: 189863 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189863 Log: 2012-07-25 François Dumont <fdumont@gcc.gnu.org> PR libstdc++/54075 * include/bits/hashtable.h (_Hashtable<>::_Hashtable(_InputIterator, _InputIterator, size_type, ...): Remove std::max usage to guarantee that hashtable state is consistent with hash policy state. (_Hashtable<>::rehash): Likewise. Set _M_prev_resize to 0 to avoid the hashtable to be shrinking on next insertion. * testsuite/23_containers/unordered_set/modifiers/reserve.cc: New. * testsuite/23_containers/unordered_multiset/modifiers/reserve.cc: New. * testsuite/23_containers/unordered_map/modifiers/reserve.cc: New. * testsuite/23_containers/unordered_multimap/modifiers/reserve.cc: New. Added: trunk/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/reserve.cc trunk/libstdc++-v3/testsuite/23_containers/unordered_multimap/modifiers/reserve.cc trunk/libstdc++-v3/testsuite/23_containers/unordered_multiset/modifiers/reserve.cc trunk/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/reserve.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/hashtable.h
I can confirm that now the reserve works as I would expect it (causing no further rehashes). However the amount of rehashes done in the testcase is still 155 (needing 4.5s), while gcc 4.6 only needs 21 (needing 1.5s). Monitoring the bucket counts on resizes it seems that gcc 4.8 is now much more fine grained than gcc 4.6. I am unsure if this is expected, deliberate and/or a good idea.
Author: fdumont Date: Thu Jul 26 12:31:50 2012 New Revision: 189889 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189889 Log: 2012-07-26 François Dumont <fdumont@gcc.gnu.org> PR libstdc++/54075 * include/bits/hashtable.h (_Hashtable<>::_Hashtable(_InputIterator, _InputIterator, size_type, ...): Remove std::max usage to guarantee that hashtable state is consistent with hash policy state. (_Hashtable<>::rehash): Likewise. Set _M_prev_resize to 0 to avoid the hashtable shrinking on next insertion. * testsuite/23_containers/unordered_set/modifiers/reserve.cc: New. * testsuite/23_containers/unordered_multiset/modifiers/reserve.cc: New. * testsuite/23_containers/unordered_map/modifiers/reserve.cc: New. * testsuite/23_containers/unordered_multimap/modifiers/reserve.cc: New. Added: branches/gcc-4_7-branch/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/reserve.cc branches/gcc-4_7-branch/libstdc++-v3/testsuite/23_containers/unordered_multimap/modifiers/reserve.cc branches/gcc-4_7-branch/libstdc++-v3/testsuite/23_containers/unordered_multiset/modifiers/reserve.cc branches/gcc-4_7-branch/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/reserve.cc Modified: branches/gcc-4_7-branch/libstdc++-v3/ChangeLog branches/gcc-4_7-branch/libstdc++-v3/include/bits/hashtable.h
In any case, please add the testcase showing 4.5s vs 1.5s.
Tried the patch and just as Dennis Lubert pointed out, the number of rehashes is not decreased. Is there any plan to fix this issue?
In my tests, with this patch, 4.7.1 is about 10% slower than 4.6 ... a vast improvement but certainly not parity. ./bench46 1.75s user 0.82s system 99% cpu 2.577 total ./bench47 8.01s user 2.78s system 99% cpu 10.800 total ./bench47+patch 1.95s user 0.80s system 99% cpu 2.764 total
Because of more rehashing, unrelated to reserve, I suppose?
I couldn't say. I don't understand the issue, I'm just reporting results and deploying packages for my fellow devs.
Testcase please.
Are you talking to me? 'cause I was providing results for the patch already committed to svn, using the code in this very bug description.
I haven't touch the grow speed for the moment. I prefer to fix the reserve Standard conformity first. Now I can restore the 4.6 grow speed as it seems to be a relatively correct one. Note that 3x slower with so many more rehash operations doesn't seem so bad ! Of course growing faster will also induce higher memory consumption which is not shown by the simple benchmark proposed here.
Am on vacation so I don't have the testcase at hand, but it is the same as likan posted in the original bugreport, minus the reserve. The main difference is that without reserve I see 21 rehashes in gcc 4.6 and 155 rehashes. I have added some code that after each insert tests if the bucket count has changed, I think that should be trivial to add for yourselves without me providing it.
Author: fdumont Date: Sun Jul 29 16:44:18 2012 New Revision: 189938 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189938 Log: 2012-07-29 François Dumont <fdumont@gcc.gnu.org> PR libstdc++/54075 * include/bits/hashtable_policy.h (_Prime_rehash_policy::_M_next_bkt): Add a growth factor set to 2 to boost growth in the number of buckets. * testsuite/performance/23_containers/insert/unordered_set.cc: New. Added: trunk/libstdc++-v3/testsuite/performance/23_containers/insert/unordered_set.cc Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/hashtable_policy.h
Author: fdumont Date: Sun Jul 29 17:06:21 2012 New Revision: 189941 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189941 Log: 2012-07-29 François Dumont <fdumont@gcc.gnu.org> PR libstdc++/54075 * include/bits/hashtable_policy.h (_Prime_rehash_policy::_M_next_bkt): Add a growth factor set to 2 to boost growth in the number of buckets. * testsuite/performance/23_containers/insert/unordered_set.cc: New. Added: branches/gcc-4_7-branch/libstdc++-v3/testsuite/performance/23_containers/insert/unordered_set.cc Modified: branches/gcc-4_7-branch/libstdc++-v3/ChangeLog branches/gcc-4_7-branch/libstdc++-v3/include/bits/hashtable_policy.h
Fixed.
I'm afraid this doesn't quite do it. I still observe a > 60% slowdown going from gcc-4.4 to gcc-4.7, with this fix already in, specifically for insert(). It's a 320,000 member table I am dealing with here. I can make 4.7 be as fast as 4.4 by preemptively setting the reserve to what I know (for this test) to be the maximum size I need, but measured resident memory shoots up (not unexpected). And resident memory of the 4.7 build was already higher than 4.4 so I don't think this can be the answer here. Playing with the load factor resulted in a minor speedup with 0.4 (from 1.0), but not reaching 4.4 performance. Other load factors (lower than 0.4 and higher than 1.0) are even slower. Is there more specific information available about the tradeoff numbers that made GNU pick this new implementation?
I can only recommend profiling (by gprof or other means).
I should clarify that I was pointed to this problem (with insert) by profiling and for us nothing pops up as faster (or smaller for that matter). Hence the question.
Indeed, if we have to do something about that, we need to know those profiles. That's my point. Otherwise, blindly, it could be anything, ie, not necessarily rehashes which are the topic of this specific PR.
I am going to take a look to 4.4 implementation to see if I can explain the difference. But waiting for that can you confirm that without the reserve the number of rehash is similar to what you had with version 4.4. Regarding the resident memory, I can't remember any overhead with the new implementation but once again checking 4.4 implementation will point me to it. Do you have it with the code originally posted, that is to say an unordered_map<int, int> ? For information the major reason to review the implementation was for Standard conformity. The erase method had to return the iterator following the erased one.
The case I reported is in a large system. I did an isolated test case which sees the time shoots up from 32.54 seconds in gcc-4.4 to 42.86 s in gcc-4.7 and back down to 32.27s when using gcc-4.7 with the tr1 hashtables. As you can see there also is a reasonably catastrophic increase in minor page faults. gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1) extra flags: meh: 1635550270 4081940 288998848 0:32.54 32.54 real 30.50 user 1.97 sys 99% CPU 0/1394855 faults text data bss dec hex filename 5456 688 3840032 3846176 3ab020 sdriver gcc version 4.7.2 (Debian 4.7.2-4) extra flags: meh: 1635550270 4081940 288998848 0:42.86 42.86 real 40.21 user 2.56 sys 99% CPU 0/2101681 faults text data bss dec hex filename 6822 760 3840032 3847614 3ab5be sdriver gcc version 4.7.2 (Debian 4.7.2-4) extra flags: -DCRA_USE_TR1 meh: 1635550270 4081940 288998848 0:32.27 32.27 real 30.29 user 1.92 sys 99% CPU 0/1394853 faults text data bss dec hex filename 5426 736 3840032 3846194 3ab032 sdriver Profiles for the three cases: http://www.cons.org/gcc-hash-slowdown/ Source code for test case: http://www.cons.org/gcc-hash-slowdown/reproduce-slow-hash.tar.gz Please feel free to copy things if you want to append things to this pr. My original case with worse slowdown is in a complicated system and driven from Lisp. For no other reason than Murphy's law the profiler in there does not penetrate into layers of C++ frames for this run, although it normally does. So I'm not able to break up the insert() into it's parts right now. I have no reason to believe that it is different than this test case, probably just a slower hash. I have seen gcc-4.7 + tr1 hashtable faster than gcc-4.4 both in the large system and in the test case so I'm pretty confident that we are looking at the same thing here. I'm fixing that profiler, more later. Hope this helps.
Thanks. Francois, can you please further investigate this issue? In fact, if the slowdown goes away with a preliminary reserve, it must be possible to handle the problem rather easily, by tweaking the grow policy or something (We should be a bit patient and consider that vs 4.4.x the implementation is almost completely different)
I fear that this performance issue is a normal drawback of the major enhancement for PR 41975. Before this evolution the hashtable data model was like a std::vector<std::forward_list>. During the insert operation it was easy to find out if the inserted element was already in the targetted bucket by simply checking for equality with each element of the bucket forwward_list. The code was something like: for (auto it = bucket.begin(); it != bucket.end(); ++it) if (comp(*it, element)) // The element already exists return false; // Ok, no equivalent element return true; After the evolution the data model became a combination of a std::forward_list containing all elements and a std::vector<std::forward_list<>::iterator> representing buckets. Now to check if an element is already in a bucket we still need to compare for equality with each element of the bucket but the element of the bucket are harder to identified. There is no more bucket end so we must also check that the element we are about to compare is still in the bucket. The code became something like: // Pre: bucket not empty. for (auto it = buckets[n];) { if (comp(*it, element)) // The element already exists return false; ++it; if (it == end() || bucket(it) != n) // We are not in bucket n anymore return false; } // Ok, no equivalent element return true; The new design helps to iterate through the container because even if it is almost empty we simply need to iterate within the forward_list. In the previous implementation we needed to iterate within a bucket forward_list and then jump above all empty buckets which could be very expensive. Try to run testsuite/performance/23_containers/insert_erase/41975.cc, you can easily tweak it to make it run with the tr1 implementation and you will notice the difference. This test also show the insert overhead even if with a perfect hasher like the one use in this test the impact is very limited. To make bucket(it) not too expensive the new implementation caches most of the time the hash code which explain the additional memory. You can try to ask for it not to be cache but you might have to qualified your hasher with noexcept, static assertions will tell you so. So for the moment I can see a way to restore tr1 insert performance without impacting erase performance. In my opinion, a Standard implementation needs to behave correctly in all use cases and not to be perform very well in one use case but very bad in an other.
Well, I haven't looked into this issue in detail, but, it looks like everyone is about the same speed if you put a foo.{reserve or rehash}(n) before the inserts. Unfortunately, it doesn't look as simple as the new impl calling for a rehash more often than the old, cause it's not. So, I don't know if the slowness is because rehash is now a lot slower, or if insert is slower but only if there aren't a huge number of empty buckets. I'll also note that libc++'s implementation (seen here: http://llvm.org/svn/llvm-project/libcxx/trunk/) appears to be getting the same speed as the old libstdc++ implementation, while appearing to have approximately the same bucket datastructure as the new libstdc++ implementation. That says to me that it should be *possible* somehow to make the new version fast.
Ok, let's reopen this with an adjusted Summary.
It seems that this commit doesn't fully fix this issue. If you call rehash multiple times with the same size, the second call to rehash resets _M_prev_resize to a non-zero value in _M_next_bkt(). Here is a sample program that shows this behavior: #include <stdio.h> #include <unordered_map> int main(void) { std::unordered_map<int, int> myMap; myMap.rehash(4000000); myMap.rehash(4000000); unsigned long long buckets = myMap.bucket_count(); int i = 0; while (i < 2000000000) { myMap.insert(std::make_pair(i, 0)); ++i; if (buckets != myMap.bucket_count()) { printf("buckets %lu -> %lu\n", buckets, myMap.bucket_count()); buckets = myMap.bucket_count(); } } return 0; } (In reply to comment #13) > Author: fdumont > Date: Thu Jul 26 12:31:50 2012 > New Revision: 189889 > > URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=189889 > Log: > 2012-07-26 François Dumont <fdumont@gcc.gnu.org> > > PR libstdc++/54075 > * include/bits/hashtable.h > (_Hashtable<>::_Hashtable(_InputIterator, _InputIterator, > size_type, ...): Remove std::max usage to guarantee that hashtable > state is consistent with hash policy state. > (_Hashtable<>::rehash): Likewise. Set _M_prev_resize to 0 to avoid > the hashtable shrinking on next insertion. > * testsuite/23_containers/unordered_set/modifiers/reserve.cc: New. > * testsuite/23_containers/unordered_multiset/modifiers/reserve.cc: New. > * testsuite/23_containers/unordered_map/modifiers/reserve.cc: New. > * testsuite/23_containers/unordered_multimap/modifiers/reserve.cc: New. > > Added: > > branches/gcc-4_7-branch/libstdc++-v3/testsuite/23_containers/unordered_map/modifiers/reserve.cc > > branches/gcc-4_7-branch/libstdc++-v3/testsuite/23_containers/unordered_multimap/modifiers/reserve.cc > > branches/gcc-4_7-branch/libstdc++-v3/testsuite/23_containers/unordered_multiset/modifiers/reserve.cc > > branches/gcc-4_7-branch/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/reserve.cc > Modified: > branches/gcc-4_7-branch/libstdc++-v3/ChangeLog > branches/gcc-4_7-branch/libstdc++-v3/include/bits/hashtable.h
Francois, can you please look further into this, possibly basing on the new testcase? Thanks!
Sure, I will. However I don't expect this problem to have any relation with the performance subject of this PR.
Ok thanks. I guess depending on the complexity of the fixes we can apply some only to mainline first and reconsider the 4_7 branch later. Please do your best to work on both issues: we just entered Stage 3 thus no new features from now on, we are all concentrated on bug fixes until the release.
Here is the patch to fix the redundant rehash/reserve issue. 2012-11-07 François Dumont <fdumont@gcc.gnu.org> PR libstdc++/54075 * include/bits/hashtable.h (_Hashtable<>::rehash): Reset hash policy state if no rehash. * testsuite/23_containers/unordered_set/modifiers/reserve.cc (test02): New. I had prepared and tested it in 4.7 branch but I can apply the same on trunk. Ok to commit ? If so, where ? François On 11/06/2012 10:33 PM, paolo.carlini at oracle dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075 > > --- Comment #39 from Paolo Carlini <paolo.carlini at oracle dot com> 2012-11-06 21:33:57 UTC --- > Ok thanks. I guess depending on the complexity of the fixes we can apply some > only to mainline first and reconsider the 4_7 branch later. Please do your best > to work on both issues: we just entered Stage 3 thus no new features from now > on, we are all concentrated on bug fixes until the release. >
On 7 November 2012 22:02, François Dumont wrote: > > Ok to commit ? If so, where ? That patch is OK for trunk and 4.7, thanks.
On 11/08/2012 01:58 AM, Jonathan Wakely wrote: > On 7 November 2012 22:02, François Dumont wrote: >> Ok to commit ? If so, where ? > That patch is OK for trunk and 4.7, thanks. ... I was having a look at this code - patch per se is straightforward enough and can in any case go in now - and something is puzzling me a lot: we always compute things, in _M_next_bkt etc, in terms of __grown_n, thus __n * 2, until the final _M_rehash call. On the other hand, the old-old code for rehash didn't use _M_growth_factor in these computations, it just literally enforced the post-conditions of the Standard. Are we sure we aren't so to speak rehashing too much? For example, when the load factor is very low and doesn't count, it looks like a current rehash(n) accomplishes the same of an old rehash(n * 2)?!? Something seems wrong, can you double check that? In any case the comments before _M_next_bkt would need fixing. Thanks, Paolo.
On 11/08/2012 02:56 AM, Paolo Carlini wrote: > On the other hand, the old-old code for rehash didn't use > _M_growth_factor in these computations, it just literally enforced the > post-conditions of the Standard. Are we sure we aren't so to speak > rehashing too much? For example, when the load factor is very low and > doesn't count, it looks like a current rehash(n) accomplishes the same > of an old rehash(n * 2)?!? Something seems wrong, can you double check > that? In any case the comments before _M_next_bkt would need fixing. ... in other terms, I really think that the only uses of _S_growth_factor should return to be inside _M_need_rehash, because that's the function called by the inserts, when the container must automatically grow the number of buckets. Elsewhere, like the constructors, rehash, should not need to know about _S_growth_factor. Paolo.
Author: fdumont Date: Thu Nov 8 20:06:00 2012 New Revision: 193335 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193335 Log: 2012-11-08 François Dumont <fdumont@gcc.gnu.org> PR libstdc++/54075 * include/bits/hashtable.h (_Hashtable<>::rehash): Reset hash policy state if no rehash. * testsuite/23_containers/unordered_set/modifiers/reserve.cc (test02): New. Modified: branches/gcc-4_7-branch/libstdc++-v3/ChangeLog branches/gcc-4_7-branch/libstdc++-v3/include/bits/hashtable.h branches/gcc-4_7-branch/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/reserve.cc
Created attachment 28638 [details] hashtable_rehash.patch Author: fdumont Date: Thu Nov 8 20:16:04 2012 New Revision: 193339 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193339 Log: 2012-11-08 François Dumont <fdumont@gcc.gnu.org> PR libstdc++/54075 * include/bits/hashtable.h (_Hashtable<>::rehash): Reset hash policy state if no rehash. * testsuite/23_containers/unordered_set/modifiers/reserve.cc (test02): New. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/hashtable.h trunk/libstdc++-v3/testsuite/23_containers/unordered_set/modifiers/reserve.cc
Created attachment 28640 [details] performance.patch Attached patch applied to trunk and 4.7 branch. 2012-11-08 François Dumont <fdumont@gcc.gnu.org> PR libstdc++/54075 * include/bits/hashtable.h (_Hashtable<>::rehash): Reset hash policy state if no rehash. * testsuite/23_containers/unordered_set/modifiers/reserve.cc (test02): New. François On 11/08/2012 01:58 AM, Jonathan Wakely wrote: > On 7 November 2012 22:02, François Dumont wrote: >> Ok to commit ? If so, where ? > That patch is OK for trunk and 4.7, thanks. >
On 11/08/2012 03:25 AM, Paolo Carlini wrote: > On 11/08/2012 02:56 AM, Paolo Carlini wrote: >> On the other hand, the old-old code for rehash didn't use >> _M_growth_factor in these computations, it just literally enforced >> the post-conditions of the Standard. Are we sure we aren't so to >> speak rehashing too much? For example, when the load factor is very >> low and doesn't count, it looks like a current rehash(n) accomplishes >> the same of an old rehash(n * 2)?!? Something seems wrong, can you >> double check that? In any case the comments before _M_next_bkt would >> need fixing. > ... in other terms, I really think that the only uses of > _S_growth_factor should return to be inside _M_need_rehash, because > that's the function called by the inserts, when the container must > automatically grow the number of buckets. Elsewhere, like the > constructors, rehash, should not need to know about _S_growth_factor. > > Paolo. > I haven't yet considered the following emails but based on those good remarks I have done the attached patch. Surprisingly it seems to have a good impact on performance even if before it testsuite/performance/23_containers/insert/unordered_set.cc was showing that new implementation was better. I have also starting to adapt tests so that it's possible to check unordered performance with std or std::tr1 implementations. Can I generalize it to other tests ? If so, is it a good approach ? François
It looks like this history is missing an end.
It looks like this story is missing an end.
This performance issue is a result of fixing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41975 It resulted in many more modulo operations and so expensive float divisions. I plan to commit an alternative hash policy using power of 2 number of buckets so that modulo is trivial. Bench are showing that performance are better but still not at the level of tr1 implementation on the operations you are interested in. So it is difficult to close this ticket cause performance regression is still there but it might stay this way for a long time.