Created attachment 31390 [details] preprocessed program As far as I know, the FNV and FNV-1a algorithms process octects, i.e. unsigned chars. (see e.g. http://tools.ietf.org/html/draft-eastlake-fnv-03#section-2 ) The implementations in tr1/functional_hash.h work on chars, instead, and their output is wrong (in architectures where char is signed) when the input byte sequence includes bytes with the MSB set. For example std::tr1::_Fnv_hash_base<4>::hash("\x80", 1) returns 83079839, instead of the correct value 2232128415. The problem resides in the type cast: template<> struct _Fnv_hash_base<4> { template<typename _Tp> static size_t hash(const _Tp* __ptr, size_t __clength) { size_t __result = static_cast<size_t>(2166136261UL); >>> const char* __cptr = reinterpret_cast<const char*>(__ptr); <<< for (; __clength; --__clength) { __result ^= static_cast<size_t>(*__cptr++); __result *= static_cast<size_t>(16777619UL); } return __result; } }; The highlighed line should read: const unsigned char* __cptr = reinterpret_cast<const unsigned char*>(__ptr); Since the FNV primes were carefully chosen in order to minimize collision rates and maximize dispersion, there might be consequences in the performance of data structures that build on tr1::hash template, perhaps tr1::unordered_map and tr1::unordered_set. Here is a short program exposing the bug: #include <iostream> #include <tr1/unordered_map> int main() { std::cout << std::tr1::_Fnv_hash_base<4>::hash("\x80", 1) << " computed, 2232128415 expected\n"; } and here is the output of g++ -v ...: Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.7/lto-wrapper Target: i486-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.7 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-targets=all --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.7.2 (Debian 4.7.2-5) COLLECT_GCC_OPTIONS='-O' '-Wextra' '-Wall' '-ansi' '-pedantic' '-v' '-save-temps' '-fno-strict-aliasing' '-fwrapv' '-shared-libgcc' '-mtune=generic' '-march=i586' /usr/lib/gcc/i486-linux-gnu/4.7/cc1plus -E -quiet -v -imultiarch i386-linux-gnu -D_GNU_SOURCE fnv-bad.cc -mtune=generic -march=i586 -ansi -Wextra -Wall -pedantic -fno-strict-aliasing -fwrapv -O -fpch-preprocess -o fnv-bad.ii ignoring nonexistent directory "/usr/local/include/i386-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/i486-linux-gnu/4.7/../../../../i486-linux-gnu/include" #include "..." search starts here: #include <...> search starts here: /usr/include/c++/4.7 /usr/include/c++/4.7/i486-linux-gnu /usr/include/c++/4.7/backward /usr/lib/gcc/i486-linux-gnu/4.7/include /usr/local/include /usr/lib/gcc/i486-linux-gnu/4.7/include-fixed /usr/include/i386-linux-gnu /usr/include End of search list. COLLECT_GCC_OPTIONS='-O' '-Wextra' '-Wall' '-ansi' '-pedantic' '-v' '-save-temps' '-fno-strict-aliasing' '-fwrapv' '-shared-libgcc' '-mtune=generic' '-march=i586' /usr/lib/gcc/i486-linux-gnu/4.7/cc1plus -fpreprocessed fnv-bad.ii -quiet -dumpbase fnv-bad.cc -mtune=generic -march=i586 -auxbase fnv-bad -O -Wextra -Wall -pedantic -ansi -version -fno-strict-aliasing -fwrapv -o fnv-bad.s GNU C++ (Debian 4.7.2-5) version 4.7.2 (i486-linux-gnu) compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version 3.1.0-p10, MPC version 0.9 GGC heuristics: --param ggc-min-expand=63 --param ggc-min-heapsize=62264 GNU C++ (Debian 4.7.2-5) version 4.7.2 (i486-linux-gnu) compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version 3.1.0-p10, MPC version 0.9 GGC heuristics: --param ggc-min-expand=63 --param ggc-min-heapsize=62264 Compiler executable checksum: 62bfd556e00a93e3d7f66f6876d73826 COLLECT_GCC_OPTIONS='-O' '-Wextra' '-Wall' '-ansi' '-pedantic' '-v' '-save-temps' '-fno-strict-aliasing' '-fwrapv' '-shared-libgcc' '-mtune=generic' '-march=i586' as -v --32 -o fnv-bad.o fnv-bad.s GNU assembler version 2.22 (i486-linux-gnu) using BFD version (GNU Binutils for Debian) 2.22 COMPILER_PATH=/usr/lib/gcc/i486-linux-gnu/4.7/:/usr/lib/gcc/i486-linux-gnu/4.7/:/usr/lib/gcc/i486-linux-gnu/:/usr/lib/gcc/i486-linux-gnu/4.7/:/usr/lib/gcc/i486-linux-gnu/ LIBRARY_PATH=/usr/lib/gcc/i486-linux-gnu/4.7/:/usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu/:/usr/lib/gcc/i486-linux-gnu/4.7/../../../../lib/:/lib/i386-linux-gnu/:/lib/../lib/:/usr/lib/i386-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/i486-linux-gnu/4.7/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-O' '-Wextra' '-Wall' '-ansi' '-pedantic' '-v' '-save-temps' '-fno-strict-aliasing' '-fwrapv' '-shared-libgcc' '-mtune=generic' '-march=i586' /usr/lib/gcc/i486-linux-gnu/4.7/collect2 --sysroot=/ --build-id --no-add-needed --eh-frame-hdr -m elf_i386 --hash-style=both -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu/crt1.o /usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu/crti.o /usr/lib/gcc/i486-linux-gnu/4.7/crtbegin.o -L/usr/lib/gcc/i486-linux-gnu/4.7 -L/usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu -L/usr/lib/gcc/i486-linux-gnu/4.7/../../../../lib -L/lib/i386-linux-gnu -L/lib/../lib -L/usr/lib/i386-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/i486-linux-gnu/4.7/../../.. fnv-bad.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/i486-linux-gnu/4.7/crtend.o /usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu/crtn.o In attachment, the preprocessed program (.ii). This problem was also reported to Debian (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=731330). Best regards, g
We're not really making any non-critical changes to TR1 code now.
(In reply to Jonathan Wakely from comment #1) > We're not really making any non-critical changes to TR1 code now. Yet std::_Fnv_hash_bytes in non-TR1 code has the same problem (lines 116 and 161 of svn://gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/libsupc++/hash_bytes.cc at revision 205748). The following program emits 83079839 instead of the expected FNV-1a result. #include <iostream> #include <unordered_map> int main() { std::cout << std::_Fnv_hash_bytes("\x80", 1, 2166136261UL) << '\n'; } Don't know if those functions are used somewhere or are leftovers. The comments in the file suggest that Murmur hash is used in the std::hash template instead of FNV-1a.
(In reply to g1 from comment #2) > Don't know if those functions are used somewhere or are leftovers. The > comments in the file suggest that Murmur hash is used in the std::hash > template instead of FNV-1a. They're just leftovers retained for backwards compatibility in applications that need the old hashing algorithm, so changing them is probably not a good idea as it would not be backwards compatible. I'll add a comment to libsupsc++/hash_bytes.cc saying it's not really FNV-1a but I don't think we should change it now. Thanks for pointing it out though.
Author: redi Date: Tue Nov 15 20:17:39 2016 New Revision: 242454 URL: https://gcc.gnu.org/viewcvs?rev=242454&root=gcc&view=rev Log: PR 59406 note that FNV hash functions are incorrect PR libstdc++/59406 * include/bits/functional_hash.h: Add comment noting difference from FNV-1a. * include/tr1/functional_hash.h: Likewise. * libsupc++/hash_bytes.cc: Likewise. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/include/bits/functional_hash.h trunk/libstdc++-v3/include/tr1/functional_hash.h trunk/libstdc++-v3/libsupc++/hash_bytes.cc
Comments added to the code on trunk. Closing as WONTFIX.