Bug 91263 - unordered_map and unordered_set operator== double key comparison causes exponential behavior
Summary: unordered_map and unordered_set operator== double key comparison causes expon...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 10.0
: P3 normal
Target Milestone: ---
Assignee: François Dumont
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-25 20:07 UTC by Nick Terrell
Modified: 2023-08-30 04:23 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-01-06 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Terrell 2019-07-25 20:07:22 UTC
When unordered_map or unordered_set has a nested unordered_{map,set} as a key, operator== becomes exponential in the depth of the nesting. This is because unordered_{map,set}::operator== does two key comparisons, one in find() using key_eq() and one when comparing the key-value pairs using the key’s operator==. These two comparisons can theoretically be different but are almost always the same in practice. unordered_{map,set} should only do one key comparison in operator== to avoid this. It is also generally more efficient even in the non-nesting cases.

Nathan Bronson helped create a minimal repro below that demonstrates this exponential behavior.

#include <unordered_set>
#include <cassert>
#include <memory>

struct Nested;

struct NestedHash {
  std::size_t operator()(Nested const& n) const;
};

struct Nested {
  std::unique_ptr<std::unordered_set<Nested, NestedHash>> set;

  explicit Nested(int depth)
      : set(std::make_unique<std::unordered_set<Nested, NestedHash>>()) {
    if (depth > 0) {
      set->emplace(depth - 1);
    }
  }
};

std::size_t NestedHash::operator()(Nested const& n) const {
  std::size_t rv = 1;
  for (auto& k : *n.set) {
    rv += operator()(k);
  }
  return rv;
}

bool operator==(Nested const& lhs, Nested const& rhs) {
  return *lhs.set == *rhs.set;
}

bool operator!=(Nested const& lhs, Nested const& rhs) {
  return !(lhs == rhs);
}

void testNestedSetEquality() {
  auto n1 = Nested(100);
  auto n2 = Nested(100);
  auto n3 = Nested(99);
  assert(n1 == n1);
  assert(n1 == n2);
  assert(n1 != n3);
}

This is a contrived example, but this shows up in practice in folly::dynamic::operator== [1] as a worst case exponential algorithm. No one will create objects like this intentionally, but they can be produced with folly::parseJson() [2] when non-string keys are allowed. If an attacker controlled the input to parseJson() they could easily DDOS the machine with a payload like "{{{{{0:0}:0}:0}:0}:0}" but nested 100 layers deep instead of 5.

unordered_map and unordered_set should only do one key comparison per item in operator==. This is possible [3] because it is undefined behavior to call the containers operator== if the key’s operator== isn’t a refinement of key_eq(). Fixing this bug in unordered_map and unordered_set avoids the exponential behavior with nested keys, and it is generally more efficient even in the non-nesting cases. An implementation of set and map equality with only one key comparison is included below, thanks to Nathan Bronson.

template <typename S>
bool eqSet(S const& lhs, S const& rhs) {
  if (lhs.size() != rhs.size()) {
    return false;
  }
  for (auto& v : lhs) {
    auto slot = rhs.bucket(v);
    auto b = rhs.begin(slot);
    auto e = rhs.end(slot);
    while (b != e && !(*b == v)) {
      ++b;
    }
    if (b == e) {
      return false;
    }
  }
  return true;
}

template <typename M>
bool eqMap(M const& lhs, M const& rhs) {
  if (lhs.size() != rhs.size()) {
    return false;
  }
  for (auto& v : lhs) {
    auto slot = rhs.bucket(v.first);
    auto b = rhs.begin(slot);
    auto e = rhs.end(slot);
    while (b != e && !(b->first == v.first)) {
      ++b;
    }
    if (b == e || !(b->second == v.second)) {
      return false;
    }
  }
  return true;
}

[1] https://github.com/facebook/folly/blob/master/folly/dynamic.cpp#L113
[2] https://github.com/facebook/folly/blob/master/folly/json.h#L194
[3] https://github.com/facebook/folly/commit/11855c21b21fb46fe1004de8c0412dd94eb5c45f
Comment 1 François Dumont 2020-01-06 21:09:22 UTC
I confirm. I even wonder if there is not a Standard conformity issue here as keys are supported to be compared using the container provided predicate, the key type do not necessarily have a == operator.
Comment 2 Jonathan Wakely 2020-01-06 22:03:42 UTC
No, a==b is equivalent to std::equal(a.begin(), a.end(), b.begin(), b.end()) which doesn't have access to the container's equality predicate.
Comment 3 Jonathan Wakely 2020-01-10 21:21:29 UTC
(In reply to Jonathan Wakely from comment #2)
> No, a==b is equivalent to std::equal(a.begin(), a.end(), b.begin(), b.end())
> which doesn't have access to the container's equality predicate.

Sorry, I wasn't paying attention when I wrote that! For unordered containers of course it's not just std::equal on the whole container, but it's still defined in terms of permutations of sub-ranges such that std::equal would be true.
Comment 4 GCC Commits 2020-01-16 14:39:48 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:d916538965ea260c6bcdb1d46581f6d572017ce8

commit r10-6005-gd916538965ea260c6bcdb1d46581f6d572017ce8
Author: Fran�ois Dumont <fdumont@gcc.gnu.org>
Date:   Thu Jan 16 08:34:21 2020 +0000

    libstdc++: Improve unordered containers == operator (PR 91263)
    
    Avoid comparing elements with operator== multiple times by replacing
    uses of find and equal_range with equivalent inlined code that uses
    operator== instead of the container's equality comparison predicate.
    This is valid because the standard requires that operator== is a
    refinement of the equality predicate.
    
    Also replace the _S_is_permutation function with std::is_permutation,
    which wasn't yet implemented when this code was first written.
    
    	PR libstdc++/91263
    	* include/bits/hashtable.h (_Hashtable<>): Make _Equality<> friend.
    	* include/bits/hashtable_policy.h: Include <bits/stl_algo.h>.
    	(_Equality_base): Remove.
    	(_Equality<>::_M_equal): Review implementation. Use
    	std::is_permutation.
    	* testsuite/23_containers/unordered_multiset/operators/1.cc
    	(Hash, Equal, test02, test03): New.
    	* testsuite/23_containers/unordered_set/operators/1.cc
    	(Hash, Equal, test02, test03): New.
Comment 5 Jonathan Wakely 2020-01-16 16:23:15 UTC
Fixed on trunk. Let's keep this open and decide whether to backport it to the release branches.
Comment 6 François Dumont 2023-08-30 04:23:56 UTC
Can be close now.