This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
hash container unordered_map is defective for key='const char*', key='string' and key='const string'
- From: "Karsten Burger" <karsten_burger at gmx dot de>
- To: libstdc++ at gcc dot gnu dot org
- Date: Thu, 30 Oct 2008 13:54:45 +0100
- Subject: hash container unordered_map is defective for key='const char*', key='string' and key='const string'
Hello,
I tested the unordered_map using an existing application which currently keeps several thousand elements in an hash_map container:
Currently this looks like:
#include <ext/hash_map>
using __gnu_cxx::hash_map;
using __gnu_cxx::hash;
typedef struct
{ ..... // value and add. information
} tOption;
struct eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp( s1, s2 ) == 0;
}
};
// type of container: key=const char*, value=tOption
typedef hash_map<const char*, tOption, hash<const char*>, eqstr> HashMap;
I could then use it like:
HashMap optionhash;
...
char key[128] = ......
HashMap::iterator it = optionhash.find(acKey);
Which was quite fast.
However, I tried to change the code to use the new unordered_map class like this:
#include <unordered_map>
using std::unordered_map;
using std::hash;
typedef unordered_map<string, tOption> HashMap;
This compiled fine but gave unexpected results:
find'ing elements in the container did not work any more.
Reason found:
There is no specialization for key=const char* or char* in the hash class any more.
So the pointer address value but not the string value is used for the hash.
I then migrated the source code to use key='const string', but this resulted in a link problem:
/usr/local/gcc-4.3.1/lib/gcc/i686-pc-linux-gnu/4.3.1/../../../../include/c++/4.3.1/tr1_impl/hashtable_policy.h:697:
undefined reference to `std::hash <std::basic_string<char, std::char_traits<char>, std::allocator<char> > const>
::operator()( std::basic_string<char, std::char_traits<char>, std::allocator<char> > ) const'
I then changed the key to 'string' and this worked, but was slow, see below.
I also noticed, that the specialization for string does pass the argument string by valuei, and that no inline code is supplied:
libstdc++-v3/include/tr1_impl/functional_hash.h:
/// Explicit specialization of member operator for non-builtin types.
template<>
size_t
hash<string>::operator()(string) const; <<<< ERROR HERE & MISSING
template<>
size_t
hash<const string&>::operator()(const string&) const;
I then used key='const char*' again but supplied it a specialization:
namespace std {
template<> inline size_t hash<const char*>::operator()(const char* val) const
{
unsigned long h = 0;
for ( ; *val; ++val)
h = 5 * h + *val;
return size_t(h);
};
}
typedef unordered_map<const char*, tOption, hash<const char*>, eqstr> HashMap;
This was much quicker, but is still slightly slower than the old class hash_map
Performance results overview: (optimization level -O3)
original performance, using gcc 4.3.1 and hash_map< const char*, >:
find existing keys: 0.83 sec, find nonexistent keys: 0.27 sec
... using gcc 4.3.1 and unordered_map<string, >:
find existing keys: 2.2 sec, find nonexistent keys: 1.5 sec
... using gcc 4.3.1 and unordered_map<const char*, > and manual specialization:
find existing keys: 0.89 sec, find nonexistent keys: 0.32 sec
Conclusion:
* in my opinion we should add the specialization for 'char*' and 'const char*' to avoid unexpected results. also migration to the new container is much easier and performance much better.
* specialization of hash<string> should not pass a string value but a reference instead.
* if possible hash<string&> and hash<const string&> should be provided as inline code.
* even with the additional specialization of hash<const char*> the new container is still a bit slower then the old one. (The performance is the same with compile option -DNDEBUG.)
Regards, Karsten