This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2

From: Jonathan Wakely <jwakely dot gcc at gmail dot com>
To: François Dumont <frs dot dumont at gmail dot com>
Cc: Paolo Carlini <paolo dot carlini at oracle dot com>, "libstdc++ at gcc dot gnu dot org" <libstdc++ at gcc dot gnu dot org>
Date: Mon, 19 Nov 2012 18:27:00 +0000
Subject: Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
References: <bug-54075-19885@http.gcc.gnu.org/bugzilla/> <bug-54075-19885-8bKDxmpStr@http.gcc.gnu.org/bugzilla/> <509ADA7E.7050504@gmail.com> <CAH6eHdRHCefZq1FQF64nJ04-OMgMBb8TeUa5y9Lvi-2H=xE0Ww@mail.gmail.com> <509B1130.7050401@oracle.com> <509B1834.5080106@oracle.com> <509C21B9.3070703@gmail.com> <509CE00A.1040704@oracle.com> <50A2BE5B.3030809@gmail.com> <50A2CF71.5060804@oracle.com> <50A40CB4.6040306@gmail.com> <50A41FA7.30301@oracle.com> <50A557EB.1020709@gmail.com> <CAH6eHdRS58LUaRTD2ienwMc05sE4mXZrbW4rYGSUPdnuQ_YzUg@mail.gmail.com> <50A9448D.3020103@gmail.com>

On 18 November 2012 20:26, François Dumont wrote:
> On 11/16/2012 02:51 AM, Jonathan Wakely wrote:
>>
>> On 15 November 2012 21:00, François Dumont wrote:
>>>
>>> On 11/14/2012 11:48 PM, Paolo Carlini wrote:
>>>>
>>>> Can somebody remind me why *exactly* we have a condition having to do
>>>> with
>>>> the empty-ness of the functor?
>>>
>>>
>>>      Because when hash code is not cache we need to embed the hash
>>> functor
>>> into the local iterator implementation. So to minimize local
>>> implementation
>>> memory footprint we prefered to cache the hash code as soon as it is not
>>> empty.
>>
>> What if sizeof(functor) == 4, it will still be smaller than the size_t
>> hash code, right? (At least for common 64-bit platforms.) And if
>> sizeof(functor) == sizeof(size_t) it doesn't increase the footprint to
>> store the functor.

(I see now my question was based on the mistaken understanding that
the choice was simply between storing a hash code or a functor, but
they're not stored in the same place so it's not a straightforward
comparison.  Thanks for clearing up my misunderstanding, I think I'm
starting to grok the whole design at last!)

> I meant I wanted to minimize sizeof(local_iterator). iterators are most of
> the time passed by value to it is better to make them small, no ?

If making them small had no other consequence, yes.  But if minimising
sizeof(local_iterator) means increasing sizeof(_Hash_node) it might be
a poor tradeoff.  Every user of unordered containers uses _Hash_node,
not everyone uses local iterators.  A single container might have
thousands of hash nodes, it's not likely a user will have thousands of
local iterators in memory at once (possible in some cases, but maybe
uncommon.)

So minimizing local iterator size at the expense of hash nodes might
be the wrong choice.

>>    And if sizeof(functor) > sizeof(size_t) then could
>> we store a single copy of the functor somewhere and store a pointer to
>> it in each node? That would have the same size as the hash code, so we
>> never increase the size of a node by more than sizeof(functor*), and
>> the memory footprint is bounded. Is that possible?
>>
>> Do we have any performance numbers showing the impact of different
>> sizes of functor stored in the nodes?
>
> No cause the functor is not stored in hashtable nodes but in local_iterator
> when hash codes are not cached.

Yes, sorry.  So do we have any performance numbers showing the impact
of different sizes of functor stored in the local iterators?

If it is inefficient to store large functors in the local iterators,
is it worth considering storing the functor in the hashtable and
storing a pointer to it in the iterators?  That would mean the local
iterator size is bounded.  But this might be a distraction from the
main issue, the size of the _Hash_node might be far more important.

>>
>> But anyway, surely non-empty functors are not the common case, so are
>> unlikely to be a performance problem for most users.
>
> Agree
>
>>
>> Do we have any idea why caching the hash code seems to be slower than
>> recalculating it?
>
> Martin Cracauer in comment #31 is saying: "As you can see there also is a
> reasonably catastrophic increase in minor page
> faults". I think it is the source of the peformance issue.

Yes, I confirmed that yesterday, by changing __uset_traits to never
cache, then adding an unused size_t member to the _Hash_node type in
the non-caching case. Simply increasing the size of the node object
reproduces the exact same slowdown.

> So if adding noexcept might improve generated
> code but is also making behavior of unordered container worst users might
> complains of not being able to have best performance in both cases.

Agreed.  So we don't want 'noexcept' to be the only criteria for
caching or not.  But the current criteria do seem wrong.

Let's look at those criteria:

We cache if the hash function can throw, that's necessary or we can't
make erase() noexcept.

We cache if the key is non-integral. This is the wrong choice for
Martin Cracauer's example, the key is not an integer, but the hash
function works on integers and is fast. It's likely there are lots of
user-defined types used as keys where the hash function works on an
integer member of a class type.  Don't we actually care about whether
the hash function is fast?  i.e. std::hash<int> is fast,
std::hash<std::string> is not.
Is that right?
We could easily have a user-specializable trait such that
hashing_is_fast<std::hash<int>> is true and
hashing_is_fast<std::hash<std::string>> is false, and users can
specialize it for their own hash functions.

Currently we must cache if the functor is final, because
_Local_iterator_base cannot derive from a final class.  That
requirement is unnecessary: we could use composition instead of
inheritance if the class is final (like the ebo helper classes do.)
Whether the hash code is cached should be independent of whether the
hash function is final.

We cache if the functor is not empty.  This reduces the
_Local_iterator_size but increases the _Hash_node size, which is the
wrong choice for Martin Cracauer's example.

Follow-Ups:
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: François Dumont

References:
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: FranÃois Dumont
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: Jonathan Wakely
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: Paolo Carlini
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: Paolo Carlini
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: François Dumont
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: Paolo Carlini
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: François Dumont
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: Paolo Carlini
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: François Dumont
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: Paolo Carlini
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: François Dumont
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: Jonathan Wakely
- Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2
  - From: François Dumont

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]