[Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2

François Dumont frs.dumont@gmail.com
Fri Nov 23 22:00:00 GMT 2012


Hi

Here is the patch I have been working on lately to add empty nodes at 
end of buckets.

Advantages:
-  it greatly simplified local_iterator implementation. There is no more 
need to store in it the hash functor or the bucket inde and bucket count
- buckets are always pointing to empty node, thanks to it when removing 
a node from a bucket there is no need to check if it is the last one. We 
only need to update next bucket when current bucket is emptied.
- many usage of _M_bucket_index has been removed meaning less occasion 
to rely on hash functor.

Drawbacks:
- _M_rehash_aux might need the allocation of new empty nodes. The result 
is that this method do not offer a strong exception safety like it used to.
- To create a node I need to request the allocator twice, once for the 
node and an other time for the stored value. This is quite unusual so I 
don't know if it is a real problem

So now the performance:

54075.cc                        std::tr1::unordered_set without hash 
code cached 300000 Foo insertions     9r    8u    2s 13765616mem    0pf
54075.cc                        std::tr1::unordered_set without hash 
code cached 10 times insertion of 300000 Foo         19r 19u    
0s        0mem    0pf
54075.cc                        std::tr1::unordered_set with hash code 
cached 300000 Foo insertions       14r   14u    0s 18561984mem    0pf
54075.cc                        std::tr1::unordered_set with hash code 
cached 10 times insertion of 300000 Foo    21r   21u 0s        0mem    0pf
54075.cc                        std::tr1::unordered_multiset without 
hash code cached 300000 Foo insertions        8r    8u 0s 13761968mem    
0pf
54075.cc                        std::tr1::unordered_multiset without 
hash code cached 10 times insertion of 300000 Foo   109r 101u    8s 
126686848mem    0pf
54075.cc                        std::tr1::unordered_multiset with hash 
code cached 300000 Foo insertions          95r   94u    0s 
18561984mem    0pf
54075.cc                        std::tr1::unordered_multiset with hash 
code cached 10 times insertion of 300000 Foo      108r 103u    4s 
174686864mem    0pf
54075.cc                        std::unordered_set without hash code 
cached 300000 Foo insertions         16r   14u    2s 30644144mem    0pf
54075.cc                        std::unordered_set without hash code 
cached 10 times insertion of 300000 Foo      37r   37u 0s        0mem    
0pf
54075.cc                        std::unordered_set with hash code cached 
300000 Foo insertions    14r   14u    0s 30653968mem    0pf
54075.cc                        std::unordered_set with hash code cached 
10 times insertion of 300000 Foo         34r   34u 0s        0mem    0pf
54075.cc                        std::unordered_multiset without hash 
code cached 300000 Foo insertions    13r   14u    0s 30656464mem    0pf
54075.cc                        std::unordered_multiset without hash 
code cached 10 times insertion of 300000 Foo         37r 36u    
0s        0mem    0pf
54075.cc                        std::unordered_multiset with hash code 
cached 300000 Foo insertions       14r   13u    0s 30657488mem    0pf
54075.cc                        std::unordered_multiset with hash code 
cached 10 times insertion of 300000 Foo    34r   34u 0s        0mem    0pf

     As you can see performance are awful :-) The memory consumption 
increase is really too important. What I find especially surprising is:
54075.cc                        std::unordered_set without hash code 
cached 10 times insertion of 300000 Foo      37r   37u 0s        0mem    0pf

It is not better than without the empty nodes.

   But I think I won't lose my time anymore on this approach and will 
rather prepare a small patch to change default behavior regarding the 
hash code. I agree with all your remarks Jonathan so I will use them for 
the patch.

François


On 11/19/2012 07:27 PM, Jonathan Wakely wrote:
> On 18 November 2012 20:26, François Dumont wrote:
>> On 11/16/2012 02:51 AM, Jonathan Wakely wrote:
>>> On 15 November 2012 21:00, François Dumont wrote:
>>>> On 11/14/2012 11:48 PM, Paolo Carlini wrote:
>>>>> Can somebody remind me why *exactly* we have a condition having to do
>>>>> with
>>>>> the empty-ness of the functor?
>>>>
>>>>       Because when hash code is not cache we need to embed the hash
>>>> functor
>>>> into the local iterator implementation. So to minimize local
>>>> implementation
>>>> memory footprint we prefered to cache the hash code as soon as it is not
>>>> empty.
>>> What if sizeof(functor) == 4, it will still be smaller than the size_t
>>> hash code, right? (At least for common 64-bit platforms.) And if
>>> sizeof(functor) == sizeof(size_t) it doesn't increase the footprint to
>>> store the functor.
> (I see now my question was based on the mistaken understanding that
> the choice was simply between storing a hash code or a functor, but
> they're not stored in the same place so it's not a straightforward
> comparison.  Thanks for clearing up my misunderstanding, I think I'm
> starting to grok the whole design at last!)
>
>> I meant I wanted to minimize sizeof(local_iterator). iterators are most of
>> the time passed by value to it is better to make them small, no ?
> If making them small had no other consequence, yes.  But if minimising
> sizeof(local_iterator) means increasing sizeof(_Hash_node) it might be
> a poor tradeoff.  Every user of unordered containers uses _Hash_node,
> not everyone uses local iterators.  A single container might have
> thousands of hash nodes, it's not likely a user will have thousands of
> local iterators in memory at once (possible in some cases, but maybe
> uncommon.)
>
> So minimizing local iterator size at the expense of hash nodes might
> be the wrong choice.
>
>>>     And if sizeof(functor) > sizeof(size_t) then could
>>> we store a single copy of the functor somewhere and store a pointer to
>>> it in each node? That would have the same size as the hash code, so we
>>> never increase the size of a node by more than sizeof(functor*), and
>>> the memory footprint is bounded. Is that possible?
>>>
>>> Do we have any performance numbers showing the impact of different
>>> sizes of functor stored in the nodes?
>> No cause the functor is not stored in hashtable nodes but in local_iterator
>> when hash codes are not cached.
> Yes, sorry.  So do we have any performance numbers showing the impact
> of different sizes of functor stored in the local iterators?
>
> If it is inefficient to store large functors in the local iterators,
> is it worth considering storing the functor in the hashtable and
> storing a pointer to it in the iterators?  That would mean the local
> iterator size is bounded.  But this might be a distraction from the
> main issue, the size of the _Hash_node might be far more important.
>
>>> But anyway, surely non-empty functors are not the common case, so are
>>> unlikely to be a performance problem for most users.
>> Agree
>>
>>> Do we have any idea why caching the hash code seems to be slower than
>>> recalculating it?
>> Martin Cracauer in comment #31 is saying: "As you can see there also is a
>> reasonably catastrophic increase in minor page
>> faults". I think it is the source of the peformance issue.
> Yes, I confirmed that yesterday, by changing __uset_traits to never
> cache, then adding an unused size_t member to the _Hash_node type in
> the non-caching case. Simply increasing the size of the node object
> reproduces the exact same slowdown.
>
>> So if adding noexcept might improve generated
>> code but is also making behavior of unordered container worst users might
>> complains of not being able to have best performance in both cases.
> Agreed.  So we don't want 'noexcept' to be the only criteria for
> caching or not.  But the current criteria do seem wrong.
>
> Let's look at those criteria:
>
> We cache if the hash function can throw, that's necessary or we can't
> make erase() noexcept.
>
> We cache if the key is non-integral. This is the wrong choice for
> Martin Cracauer's example, the key is not an integer, but the hash
> function works on integers and is fast. It's likely there are lots of
> user-defined types used as keys where the hash function works on an
> integer member of a class type.  Don't we actually care about whether
> the hash function is fast?  i.e. std::hash<int> is fast,
> std::hash<std::string> is not.
> Is that right?
> We could easily have a user-specializable trait such that
> hashing_is_fast<std::hash<int>> is true and
> hashing_is_fast<std::hash<std::string>> is false, and users can
> specialize it for their own hash functions.
>
> Currently we must cache if the functor is final, because
> _Local_iterator_base cannot derive from a final class.  That
> requirement is unnecessary: we could use composition instead of
> inheritance if the class is final (like the ebo helper classes do.)
> Whether the hash code is cached should be independent of whether the
> hash function is final.
>
> We cache if the functor is not empty.  This reduces the
> _Local_iterator_size but increases the _Hash_node size, which is the
> wrong choice for Martin Cracauer's example.
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: hashtable.patch
Type: text/x-patch
Size: 36322 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/libstdc++/attachments/20121123/c2160a8d/attachment.bin>


More information about the Libstdc++ mailing list