[PATCH] libstdc++/91223 Improve unordered containers == operator
Jonathan Wakely
jwakely@redhat.com
Thu Jan 16 17:24:00 GMT 2020
On 16/01/20 13:25 +0000, Jonathan Wakely wrote:
>On 16/01/20 07:42 +0100, François Dumont wrote:
>>On 1/15/20 10:52 PM, Jonathan Wakely wrote:
>>>On 15/01/20 21:48 +0000, Jonathan Wakely wrote:
>>>>On 14/01/20 22:25 +0100, François Dumont wrote:
>>>>>On 1/13/20 10:53 PM, Jonathan Wakely wrote:
>>>>>>On 13/01/20 22:41 +0100, François Dumont wrote:
>>>>>>>
>>>>>>>For the multi-keys we could still avoid redundant
>>>>>>>comparisons when _Equal is just doing == on the key type.
>>>>>>>On unordered_multiset we could just avoids the call to
>>>>>>>is_permuation and on the unordered_multimap we could check
>>>>>>>the is_permutation only on the associated value rather
>>>>>>>than on the std::pair.
>>>>>>>
>>>>>>I don't think that's necessary, or helpful.
>>>>>>
>>>>>>The idea of https://gcc.gnu.org/ml/libstdc++/2020-01/msg00070.html is
>>>>>>that you shouldn't be using _Equal at all, and therefore it doesn't
>>>>>>matter whether it's std::equal_to or not.
>>>>>>
>>>>>>
>>>>>And it was indeed possible.
>>>>
>>>>Nice!
>>>>
>>>>> PR libstdc++/91223
>>>>> * include/bits/hashtable.h (_Hashtable<>): Make _Equality<> friend.
>>>>> * include/bits/hashtable_policy.h: Include <bits/stl_algo.h>.
>>>>> (_Equality_base): Remove.
>>>>> (_Equality<>::_M_equal): Review implementation. Use
>>>>>std::is_permutation.
>>>>> * testsuite/23_containers/unordered_multiset/operators/1.cc
>>>>> (Hash, Equal, test02, test03): New.
>>>>> * testsuite/23_containers/unordered_set/operators/1.cc
>>>>> (Hash, Equal, test02, test03): New.
>>>>>
>>>>>Tested under Linux x86_64.
>>>>>
>>>>>Ok to commit ?
>>>>
>>>>Yes, OK for trunk (we're in stage4 but your patch was posted in stage3
>>>>and fixes a pretty nasty performance bug, so is OK now).
>>>>
>>>>N.B. GCC has moved to Git instead of Subversion. If you don't have Git
>>>>access set up let me know and I can commit the patch for you.
>>
>>I haven't done the move yet and won't be able to do it before the
>>week-end. So please proceed to the commit for me, thanks.
>
>No problem, I can do that.
Your patch is now committed to trunk. Thanks for the major
improvement.
I had a look at std::is_permutation and I think we can make some
simplifications to the 4-argument overload, and we can share most of
the code between the 3-arg and 4-arg overloads (once they've confirmed
the lengths are the same they do exactly the same thing). See the
attached patch. This should probably wait for stage1 though.
I also wanted to add some comments to the _Equality::_M_equal
specialiation for unordered_multimap/multiset to explain what the code
was doing, and had some more ideas. See patch again.
It looks like this loop can potentially visit every element of
__other, instead of stopping at the end of the bucket:
typename __hashtable::const_iterator __ity(__y_n);
for (auto __ity_end = __ity; __ity_end != __other.end(); ++__ity_end)
if (--__x_count == 0)
break;
Consider a case like this:
unordered_multiset<int> a{1, 2, 3, 4};
for (int i = 0; i <10000; ++i)
a.insert(1);
unordered_multiset<int> b{1, 2, 3, 4};
for (int i = 0; i <10000; ++i)
b.insert(2);
When doing a == b we'll find 10000 elements in a with key '1',
and find one element in b with that key. But then we iterate through
every element in b after that one, even though they have different
keys and are probably in different buckets.
Instead of just iterating from __ity to __other.end(), can we use a
local iterator so we stop at the end of the bucket?
This seems to make the PR91263 example *very* slightly slower, but
makes the example above significantly faster.
What do you think?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.txt
Type: text/x-patch
Size: 5885 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/libstdc++/attachments/20200116/1ca45126/attachment.bin>
More information about the Libstdc++
mailing list