This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
[v3] libstdc++/41975
- From: François Dumont <francois dot cppdevs at free dot fr>
- To: "libstdc++ at gcc dot gnu dot org" <libstdc++ at gcc dot gnu dot org>
- Date: Mon, 25 Jul 2011 22:12:39 +0200
- Subject: [v3] libstdc++/41975
Hi
I am finally going to report my thoughts first before submitting a
patch regarding the performance issue of DR 41975.
As signaled before this unordered container issue is quite similar
to the one I had on STLport. And indeed the data model of those
containers is quite similar between the two implementations that is to
say a singly link list of elements and an array for the buckets. With
this kind of data structure it is mandatory to have a correct balance
between the number of buckets and the number of elements. Standard
max_load_factor already defined the minimum bucket count for a given
number of elements but the data structure impose on its side an implicit
maximum limit above which performance issue will arise because of the
holes appearing in the buckets.
This is why I propose 2 modifications:
1. Introduce rehash when elements are removed. As signal in the DR this
is a heavy operation but even if it is heavy we accept it when elements
are inserted, no reason to refuse it when elements are removed, we can
do so still respecting the Standard operation complexity. However doing
so with current implementation will invalidate iterators like signal in
the DR thread, this is why we also need next point:
2. Have elements stored in a real linked list. For the moment libstdc++
is storing those elements in small lists corresponding to each bucket,
those lists are not linked to each other. In STLport I had all elements
in a linked list and buckets pointing to different position in this
list. Doing so iterating over the container was really fast, as fast as
iterating on a forward_list, returning next valid element in erase
method will be also straightforward. However there is of course a small
drawback, insert/erase operations will require to update potentially
several bucket entries:
Given the following list of elements (E for Element, B for Bucket):
E1 in B2 -> E2 in B3 -> E3 in B4 -> E4 in B5 -> null
We will have buckets containing:
B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10
E1 E1 E2 E3 E4 null null null null null
If we need to insert E5 in bucket B9 we will have to update B6 to B9
values to E5 to have:
B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10
E1 E1 E2 E3 E4 E5 E5 E5 E5 null
it might seems awful but once again, with a limited number of buckets
for a given number of elements we won't need to update too many bucket
values.
There is however an other potential issue with shrinking the unordered
container when elements are erased. Iterators can remain valid but the
following code won't have the expected behavior:
template <typename _Pred>
void erase_if(std::unordered_set<int> c, _Pred pred)
{
std::unordered_set<int>::iterator it = c.begin();
while (it != c.end())
{
if (pred(*it))
it = c.erase(it);
else
++it;
}
}
As soon as an iterator is erased and the container is rehashed the
returned (valid) iterator might become the iterator just before the
container end skipping a lot of elements that won't be tested with pred.
Is erase returned iterator purpose this one ? If so it means that the
container can't be rehash when shrinking and I won't have any solution
for 41975.
Regards