Bug 67922 - [DR 2550] std::unordered_map::clear should take time linear in the number of elements
Summary: [DR 2550] std::unordered_map::clear should take time linear in the number of ...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 5.2.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-10 22:33 UTC by Yegor Derevenets
Modified: 2016-03-01 15:46 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-10-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yegor Derevenets 2015-10-10 22:33:14 UTC
std::unordered_map::clear internally clears the whole array of buckets using memset. Consequently, the clearing time is proportional to the number of buckets, instead of being proportional to the number of elements, which one would expect, and what arguably should be the case according to the C++ Standard. (The Standard specifies that clear's complexity is linear, but unfortunately does not say, linear in what.) This leads to terrible performance of the following code:

#include <unordered_map>

int main() {
	std::unordered_map<int, int> m;
	for (int i = 0; i < 1000000; ++i) {
		m[i] = i;
	}
	for (int i = 0; i < 1000; ++i) {
		m.clear();
	}
}

$ g++-5 -O2 test.cpp -std=c++11 && time ./a.out

real    0m4.009s
user    0m3.924s
sys     0m0.028s

$ clang++-3.7 -stdlib=libstdc++ -O2 test.cpp -std=c++11 && time ./a.out

real    0m4.153s
user    0m3.976s
sys     0m0.036s

If you build the same program with libc++, the problem goes away:

$ clang++-3.7 -stdlib=libc++ -O2 test.cpp -std=c++11 && time ./a.out

real    0m0.165s
user    0m0.128s
sys     0m0.036s

You can achieve similar performance with libstdc++ if you implement clear() manually in the most naive way:

#include <unordered_map>

int main() {
        std::unordered_map<int, int> m;
        for (int i = 0; i < 1000000; ++i) {
                m[i] = i;
        }
        for (int i = 0; i < 1000; ++i) {
                while (m.begin() != m.end()) {
                        m.erase(m.begin());
                }
        }
}

$ g++-5 -O2 test.cpp -std=c++11 && time ./a.out

real    0m0.167s
user    0m0.132s
sys     0m0.028s

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 5.2.1-21' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.2.1 20151003 (Debian 5.2.1-21)
Comment 1 Richard Biener 2015-10-12 08:05:26 UTC
But then the issue is that clear () doesn't shrink the map.  Otherwise
O(number of slots) == O(number of elements)
Comment 2 Yegor Derevenets 2015-10-12 09:06:49 UTC
> But then the issue is that clear () doesn't shrink the map.
No, the issue is that clear() touches all the buckets, instead of touching only those containing the elements. libc++'s implementation does not shrink the map (does not decrease the number of buckets), and is still fast. Actually, if in libstdc++ you do m.erase(m.begin(), m.end()) instead of m.clear(), it will be fast too.

#include <iostream>
#include <unordered_map>

int main() {
	std::unordered_map<int, int> m;
	for (int i = 0; i < 1000000; ++i) {
		m[i] = i;
	}
	std::cout << "Before clear(): " << m.bucket_count() << std::endl;
	m.clear();
	std::cout << "After clear(): " << m.bucket_count() << std::endl;
}

$ clang++-3.7 -stdlib=libstdc++ -O2 -std=c++11 test.cpp && time ./a.out
Before clear(): 1056323
After clear(): 1056323

real    0m0.074s
user    0m0.076s
sys     0m0.000s

$ clang++-3.7 -stdlib=libc++ -O2 -std=c++11 test.cpp && time ./a.out
Before clear(): 1646237
After clear(): 1646237

real    0m0.102s
user    0m0.080s
sys     0m0.016s
Comment 3 Yegor Derevenets 2015-11-02 16:59:50 UTC
A small correction. A colleague of mine bothered to read the source code of libc++ and noticed that its implementation of clear() method also generally takes time, linear in the number of buckets. This was not visible in the tests I already presented, because libc++ has special handling of the empty container case:

template <class _Tp, class _Hash, class _Equal, class _Alloc>
void
__hash_table<_Tp, _Hash, _Equal, _Alloc>::clear() _NOEXCEPT
{
    if (size() > 0)
    {
        __deallocate(__p1_.first().__next_);
        __p1_.first().__next_ = nullptr;
        size_type __bc = bucket_count();
        for (size_type __i = 0; __i < __bc; ++__i)
            __bucket_list_[__i] = nullptr;
        size() = 0;
    }
}

If we modify the test example slightly, we get:

$ cat test_clear.cpp
#include <unordered_map>

int main() {
        std::unordered_map<int, int> m;
        for (int i = 0; i < 1000000; ++i) {
                m[i] = i;
        }
        for (int i = 0; i < 1000; ++i) {
                m[i] = i;
                m.clear();
        }
}

$ clang++-3.7 -O2 -std=c++11 -stdlib=libstdc++ test_clear.cpp && time ./a.out

real    0m4.054s
user    0m4.000s
sys     0m0.036s

$ clang++-3.7 -O2 -std=c++11 -stdlib=libc++ test_clear.cpp && time ./a.out

real    0m6.114s
user    0m6.000s
sys     0m0.036s

$ cat test_erase.cpp
#include <unordered_map>

int main() {
        std::unordered_map<int, int> m;
        for (int i = 0; i < 1000000; ++i) {
                m[i] = i;
        }
        for (int i = 0; i < 1000; ++i) {
                m[i] = i;
                m.erase(m.begin(), m.end());
        }
}

$ clang++-3.7 -O2 -std=c++11 -stdlib=libstdc++ test_erase.cpp && time ./a.out

real    0m0.151s
user    0m0.116s
sys     0m0.036s

$ clang++-3.7 -O2 -std=c++11 -stdlib=libc++ test_erase.cpp && time ./a.out

real    0m0.187s
user    0m0.156s
sys     0m0.028s

I find it strange that both libraries implement clear() less efficiently than erase(m.begin(), m.end()). Maybe there is a rationale for this which I do not understand?
Comment 4 Jonathan Wakely 2016-02-04 17:39:04 UTC
http://cplusplus.github.io/LWG/lwg-active.html#2550
Comment 5 Jonathan Wakely 2016-03-01 15:46:41 UTC
LWG agrees with the proposed resolution of 2550