std::unordered_map::clear internally clears the whole array of buckets using memset. Consequently, the clearing time is proportional to the number of buckets, instead of being proportional to the number of elements, which one would expect, and what arguably should be the case according to the C++ Standard. (The Standard specifies that clear's complexity is linear, but unfortunately does not say, linear in what.) This leads to terrible performance of the following code: #include <unordered_map> int main() { std::unordered_map<int, int> m; for (int i = 0; i < 1000000; ++i) { m[i] = i; } for (int i = 0; i < 1000; ++i) { m.clear(); } } $ g++-5 -O2 test.cpp -std=c++11 && time ./a.out real 0m4.009s user 0m3.924s sys 0m0.028s $ clang++-3.7 -stdlib=libstdc++ -O2 test.cpp -std=c++11 && time ./a.out real 0m4.153s user 0m3.976s sys 0m0.036s If you build the same program with libc++, the problem goes away: $ clang++-3.7 -stdlib=libc++ -O2 test.cpp -std=c++11 && time ./a.out real 0m0.165s user 0m0.128s sys 0m0.036s You can achieve similar performance with libstdc++ if you implement clear() manually in the most naive way: #include <unordered_map> int main() { std::unordered_map<int, int> m; for (int i = 0; i < 1000000; ++i) { m[i] = i; } for (int i = 0; i < 1000; ++i) { while (m.begin() != m.end()) { m.erase(m.begin()); } } } $ g++-5 -O2 test.cpp -std=c++11 && time ./a.out real 0m0.167s user 0m0.132s sys 0m0.028s $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 5.2.1-21' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 5.2.1 20151003 (Debian 5.2.1-21)
But then the issue is that clear () doesn't shrink the map. Otherwise O(number of slots) == O(number of elements)
> But then the issue is that clear () doesn't shrink the map. No, the issue is that clear() touches all the buckets, instead of touching only those containing the elements. libc++'s implementation does not shrink the map (does not decrease the number of buckets), and is still fast. Actually, if in libstdc++ you do m.erase(m.begin(), m.end()) instead of m.clear(), it will be fast too. #include <iostream> #include <unordered_map> int main() { std::unordered_map<int, int> m; for (int i = 0; i < 1000000; ++i) { m[i] = i; } std::cout << "Before clear(): " << m.bucket_count() << std::endl; m.clear(); std::cout << "After clear(): " << m.bucket_count() << std::endl; } $ clang++-3.7 -stdlib=libstdc++ -O2 -std=c++11 test.cpp && time ./a.out Before clear(): 1056323 After clear(): 1056323 real 0m0.074s user 0m0.076s sys 0m0.000s $ clang++-3.7 -stdlib=libc++ -O2 -std=c++11 test.cpp && time ./a.out Before clear(): 1646237 After clear(): 1646237 real 0m0.102s user 0m0.080s sys 0m0.016s
A small correction. A colleague of mine bothered to read the source code of libc++ and noticed that its implementation of clear() method also generally takes time, linear in the number of buckets. This was not visible in the tests I already presented, because libc++ has special handling of the empty container case: template <class _Tp, class _Hash, class _Equal, class _Alloc> void __hash_table<_Tp, _Hash, _Equal, _Alloc>::clear() _NOEXCEPT { if (size() > 0) { __deallocate(__p1_.first().__next_); __p1_.first().__next_ = nullptr; size_type __bc = bucket_count(); for (size_type __i = 0; __i < __bc; ++__i) __bucket_list_[__i] = nullptr; size() = 0; } } If we modify the test example slightly, we get: $ cat test_clear.cpp #include <unordered_map> int main() { std::unordered_map<int, int> m; for (int i = 0; i < 1000000; ++i) { m[i] = i; } for (int i = 0; i < 1000; ++i) { m[i] = i; m.clear(); } } $ clang++-3.7 -O2 -std=c++11 -stdlib=libstdc++ test_clear.cpp && time ./a.out real 0m4.054s user 0m4.000s sys 0m0.036s $ clang++-3.7 -O2 -std=c++11 -stdlib=libc++ test_clear.cpp && time ./a.out real 0m6.114s user 0m6.000s sys 0m0.036s $ cat test_erase.cpp #include <unordered_map> int main() { std::unordered_map<int, int> m; for (int i = 0; i < 1000000; ++i) { m[i] = i; } for (int i = 0; i < 1000; ++i) { m[i] = i; m.erase(m.begin(), m.end()); } } $ clang++-3.7 -O2 -std=c++11 -stdlib=libstdc++ test_erase.cpp && time ./a.out real 0m0.151s user 0m0.116s sys 0m0.036s $ clang++-3.7 -O2 -std=c++11 -stdlib=libc++ test_erase.cpp && time ./a.out real 0m0.187s user 0m0.156s sys 0m0.028s I find it strange that both libraries implement clear() less efficiently than erase(m.begin(), m.end()). Maybe there is a rationale for this which I do not understand?
http://cplusplus.github.io/LWG/lwg-active.html#2550
LWG agrees with the proposed resolution of 2550