[PATCH] libstdc++/91223 Improve unordered containers == operator

Jonathan Wakely jwakely@redhat.com
Thu Jan 16 17:24:00 GMT 2020


On 16/01/20 13:25 +0000, Jonathan Wakely wrote:
>On 16/01/20 07:42 +0100, François Dumont wrote:
>>On 1/15/20 10:52 PM, Jonathan Wakely wrote:
>>>On 15/01/20 21:48 +0000, Jonathan Wakely wrote:
>>>>On 14/01/20 22:25 +0100, François Dumont wrote:
>>>>>On 1/13/20 10:53 PM, Jonathan Wakely wrote:
>>>>>>On 13/01/20 22:41 +0100, François Dumont wrote:
>>>>>>>
>>>>>>>For the multi-keys we could still avoid redundant 
>>>>>>>comparisons when _Equal is just doing == on the key type. 
>>>>>>>On unordered_multiset we could just avoids the call to 
>>>>>>>is_permuation and on the unordered_multimap we could check 
>>>>>>>the is_permutation only on the associated value rather 
>>>>>>>than on the std::pair.
>>>>>>>
>>>>>>I don't think that's necessary, or helpful.
>>>>>>
>>>>>>The idea of https://gcc.gnu.org/ml/libstdc++/2020-01/msg00070.html is
>>>>>>that you shouldn't be using _Equal at all, and therefore it doesn't
>>>>>>matter whether it's std::equal_to or not.
>>>>>>
>>>>>>
>>>>>And it was indeed possible.
>>>>
>>>>Nice!
>>>>
>>>>>    PR libstdc++/91223
>>>>>    * include/bits/hashtable.h (_Hashtable<>): Make _Equality<> friend.
>>>>>    * include/bits/hashtable_policy.h: Include <bits/stl_algo.h>.
>>>>>    (_Equality_base): Remove.
>>>>>    (_Equality<>::_M_equal): Review implementation. Use 
>>>>>std::is_permutation.
>>>>>    * testsuite/23_containers/unordered_multiset/operators/1.cc
>>>>>    (Hash, Equal, test02, test03): New.
>>>>>    * testsuite/23_containers/unordered_set/operators/1.cc
>>>>>    (Hash, Equal, test02, test03): New.
>>>>>
>>>>>Tested under Linux x86_64.
>>>>>
>>>>>Ok to commit ?
>>>>
>>>>Yes, OK for trunk (we're in stage4 but your patch was posted in stage3
>>>>and fixes a pretty nasty performance bug, so is OK now).
>>>>
>>>>N.B. GCC has moved to Git instead of Subversion. If you don't have Git
>>>>access set up let me know and I can commit the patch for you.
>>
>>I haven't done the move yet and won't be able to do it before the 
>>week-end. So please proceed to the commit for me, thanks.
>
>No problem, I can do that.

Your patch is now committed to trunk. Thanks for the major
improvement.

I had a look at std::is_permutation and I think we can make some
simplifications to the 4-argument overload, and we can share most of
the code between the 3-arg and 4-arg overloads (once they've confirmed
the lengths are the same they do exactly the same thing). See the
attached patch. This should probably wait for stage1 though.

I also wanted to add some comments to the _Equality::_M_equal
specialiation for unordered_multimap/multiset to explain what the code
was doing, and had some more ideas. See patch again.

It looks like this loop can potentially visit every element of
__other, instead of stopping at the end of the bucket:

   typename __hashtable::const_iterator __ity(__y_n);
   for (auto __ity_end = __ity; __ity_end != __other.end(); ++__ity_end)
     if (--__x_count == 0)
       break;

Consider a case like this:

unordered_multiset<int> a{1, 2, 3, 4};
for (int i = 0; i <10000; ++i)
   a.insert(1);
unordered_multiset<int> b{1, 2, 3, 4};
for (int i = 0; i <10000; ++i)
   b.insert(2);

When doing a == b we'll find 10000 elements in a with key '1',
and find one element in b with that key. But then we iterate through
every element in b after that one, even though they have different
keys and are probably in different buckets.

Instead of just iterating from __ity to __other.end(), can we use a
local iterator so we stop at the end of the bucket?

This seems to make the PR91263 example *very* slightly slower, but
makes the example above significantly faster.

What do you think?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.txt
Type: text/x-patch
Size: 5885 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/libstdc++/attachments/20200116/1ca45126/attachment.bin>


More information about the Libstdc++ mailing list