[forwarded from https://bugs.debian.org/977638] same behavior with GCC 8, 9 and 10. On s390x, std::hash returns identical values for large classes of std::bitset and std::vector<bool>: $ cat bug.cc #include <bitset> #include <functional> #include <iostream> #include <vector> int main() { std::bitset<2> a("00"), b("01"); std::vector<bool> c = {false, true, false, true}; std::vector<bool> d = {true, false, true, false}; std::bitset<9> e("000000000"), f("010101010"); std::vector<bool> g = {true, true, true, true, true, true, true, true, true}; std::vector<bool> h = {false, false, false, true, true, false, false, false, false}; std::hash<std::bitset<2>> h1; std::hash<std::bitset<9>> h2; std::hash<std::vector<bool>> h3; std::cout << h1(a) << '\n' << h1(b) << '\n' << h3(c) << '\n' << h3(d) << "\n\n" << h2(e) << '\n' << h2(f) << '\n' << h3(g) << '\n' << h3(h) << '\n'; } $ g++ -o bug bug.cc $ ./bug 7857072875483051545 7857072875483051545 7857072875483051545 7857072875483051545 4158372090644325695 4158372090644325695 4158372090644325695 4158372090644325695 It appears that the hash value is completely dependent on the size of the object in bytes. 1–8-bit values all hash to 7857072875483051545; 9–16-bit values all hash to 4158372090644325695; and though bug.cc doesn’t demonstrate it, 17-bit values hash to 14756137038141193723.
I bet it's the default specialization.
I think we just hash the bytes of the storage, but assume the first bytes are populated. For a bitset<2> we have an unsigned long long where only the two least significant bits are used. For LE they're in the first byte, but for BE they're in the last byte.
Yes the problem is that a bitset<2> uses the two least significant bits of an unsigned long, so we want to hash a single byte. But we take the address of the unsigned long and then hash the first byte. For BE we need to hash the last byte. With a bitset of more than 64 bits we have a layout like this: [76543210][...43210] We assume the value bytes are contiguous and just hash N bytes from the address of the first long, and then the next 5 bytes. We should be hashing the last 5 bytes of the last word instead. I haven't looked at hash<vector<bool>> but it's probably the same bug.
(In reply to Matthias Klose from comment #0) > It appears that the hash value is completely dependent on the size of > the object in bytes. It's not *completely* dependent on the size. Only the last x.size()%64 bits are hashed incorrectly. For x.size() < 32 it's completely dependent, because we never look at the right bits. For larger numbers of bits we look at *some* of them. Fixing it is an ABI break though, because it would mean that the same value produces a different hash when compiled with an old GCC or a fixed GCC. Inserting an element into an unordered_map in one TU and then looking it up in another TU could fail.
*** Bug 102531 has been marked as a duplicate of this bug. ***