This is the mail archive of the mailing list for the libstdc++ project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

hash container unordered_map is defective for key='const char*', key='string' and key='const string'

I tested the unordered_map using an existing application which currently keeps several thousand elements in an hash_map container:

Currently this looks like:

  #include <ext/hash_map>
  using __gnu_cxx::hash_map;
  using __gnu_cxx::hash;

  typedef struct
  { .....    // value and add. information
  } tOption;

  struct eqstr
    bool operator()(const char* s1, const char* s2) const
      return strcmp( s1, s2 ) == 0;

  // type of container: key=const char*,  value=tOption
  typedef hash_map<const char*, tOption, hash<const char*>, eqstr> HashMap;

I could then use it like:

  HashMap optionhash;

  char key[128] = ......

  HashMap::iterator it = optionhash.find(acKey);

Which was quite fast.

However, I tried to change the code to use the new unordered_map class like this:

  #include <unordered_map>
  using std::unordered_map;
  using std::hash;

  typedef unordered_map<string, tOption> HashMap;

This compiled fine but gave unexpected results:
  find'ing elements in the container did not work any more.

Reason found:
  There is no specialization for key=const char* or char* in the hash class any more.
  So the pointer address value but not the string value is used for the hash.

I then migrated the source code to use key='const string', but this resulted in a link problem:

  undefined reference to `std::hash  <std::basic_string<char, std::char_traits<char>, std::allocator<char> > const>
    ::operator()( std::basic_string<char, std::char_traits<char>, std::allocator<char> > ) const'

I then changed the key to 'string' and this worked, but was slow, see below.

I also noticed, that the specialization for string does pass the argument string by valuei, and that no inline code is supplied:

  /// Explicit specialization of member operator for non-builtin types.
    hash<string>::operator()(string) const;   <<<< ERROR HERE & MISSING

    hash<const string&>::operator()(const string&) const;

I then used key='const char*' again but supplied it a specialization:
  namespace std {
    template<> inline size_t hash<const char*>::operator()(const char* val) const
      unsigned long h = 0;
      for ( ; *val; ++val)
        h = 5 * h + *val;
      return size_t(h);
  typedef unordered_map<const char*, tOption, hash<const char*>, eqstr> HashMap;

This was much quicker, but is still slightly slower than the old class hash_map

Performance results overview: (optimization level -O3)

original performance, using gcc 4.3.1 and hash_map< const char*, >:
     find existing keys: 0.83 sec,   find nonexistent keys: 0.27 sec

... using gcc 4.3.1 and unordered_map<string, >:
     find existing keys: 2.2  sec,   find nonexistent keys: 1.5  sec

... using gcc 4.3.1 and unordered_map<const char*, > and manual specialization:
     find existing keys: 0.89 sec,   find nonexistent keys: 0.32  sec

* in my opinion we should add the specialization for 'char*' and 'const char*' to avoid unexpected results. also migration to the new container is much easier and performance much better.
* specialization of hash<string> should not pass a string value but a reference instead.
* if possible hash<string&> and hash<const string&> should be provided as inline code.
* even with the additional specialization of hash<const char*> the new container is still a bit slower then the old one. (The performance is the same with compile option -DNDEBUG.)

Regards, Karsten

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]