This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[v3] libstdc++/41975


Hi

I am finally going to report my thoughts first before submitting a patch regarding the performance issue of DR 41975.

As signaled before this unordered container issue is quite similar to the one I had on STLport. And indeed the data model of those containers is quite similar between the two implementations that is to say a singly link list of elements and an array for the buckets. With this kind of data structure it is mandatory to have a correct balance between the number of buckets and the number of elements. Standard max_load_factor already defined the minimum bucket count for a given number of elements but the data structure impose on its side an implicit maximum limit above which performance issue will arise because of the holes appearing in the buckets.

This is why I propose 2 modifications:
1. Introduce rehash when elements are removed. As signal in the DR this is a heavy operation but even if it is heavy we accept it when elements are inserted, no reason to refuse it when elements are removed, we can do so still respecting the Standard operation complexity. However doing so with current implementation will invalidate iterators like signal in the DR thread, this is why we also need next point:
2. Have elements stored in a real linked list. For the moment libstdc++ is storing those elements in small lists corresponding to each bucket, those lists are not linked to each other. In STLport I had all elements in a linked list and buckets pointing to different position in this list. Doing so iterating over the container was really fast, as fast as iterating on a forward_list, returning next valid element in erase method will be also straightforward. However there is of course a small drawback, insert/erase operations will require to update potentially several bucket entries:


Given the following list of elements (E for Element, B for Bucket):
E1 in B2 -> E2 in B3 -> E3 in B4 -> E4 in B5 -> null

We will have buckets containing:

B1  |  B2  |  B3  |  B4  |  B5  |  B6  |  B7  |  B8  |  B9  |  B10
E1     E1      E2     E3      E4     null    null   null   null    null

If we need to insert E5 in bucket B9 we will have to update B6 to B9 values to E5 to have:

B1  |  B2  |  B3  |  B4  |  B5  |  B6  |  B7  |  B8  |  B9  |  B10
E1     E1      E2     E3      E4     E5      E5     E5     E5     null

it might seems awful but once again, with a limited number of buckets for a given number of elements we won't need to update too many bucket values.

There is however an other potential issue with shrinking the unordered container when elements are erased. Iterators can remain valid but the following code won't have the expected behavior:

template <typename _Pred>
void erase_if(std::unordered_set<int> c, _Pred pred)
{
    std::unordered_set<int>::iterator it = c.begin();
    while (it != c.end())
    {
        if (pred(*it))
            it = c.erase(it);
        else
            ++it;
    }
}

As soon as an iterator is erased and the container is rehashed the returned (valid) iterator might become the iterator just before the container end skipping a lot of elements that won't be tested with pred. Is erase returned iterator purpose this one ? If so it means that the container can't be rehash when shrinking and I won't have any solution for 41975.

Regards


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]