Bug 59406 - functions labelled FNV-1a in tr1/functional_hash.h are not FNV-1a
Summary: functions labelled FNV-1a in tr1/functional_hash.h are not FNV-1a
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.7.2
: P3 minor
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-06 11:57 UTC by g1
Modified: 2016-11-15 20:18 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2015-03-31 00:00:00


Attachments
preprocessed program (65.84 KB, text/plain)
2013-12-06 11:57 UTC, g1
Details

Note You need to log in before you can comment on or make changes to this bug.
Description g1 2013-12-06 11:57:43 UTC
Created attachment 31390 [details]
preprocessed program

As far as I know, the FNV and FNV-1a algorithms process octects, i.e. unsigned
chars.  (see e.g. http://tools.ietf.org/html/draft-eastlake-fnv-03#section-2 )

The implementations in tr1/functional_hash.h work
on chars, instead, and their output is wrong (in architectures where char is
signed) when the input byte sequence includes bytes with the MSB set.

For example
    std::tr1::_Fnv_hash_base<4>::hash("\x80", 1)
returns 83079839, instead of the correct value 2232128415.

The problem resides in the type cast:

    template<>
    struct _Fnv_hash_base<4>
    {
      template<typename _Tp>
        static size_t
        hash(const _Tp* __ptr, size_t __clength)
        {
          size_t __result = static_cast<size_t>(2166136261UL);
>>>       const char* __cptr = reinterpret_cast<const char*>(__ptr);    <<<
          for (; __clength; --__clength)
            {
              __result ^= static_cast<size_t>(*__cptr++);
              __result *= static_cast<size_t>(16777619UL);
            }
          return __result;
        }
    };

The highlighed line should read:
const unsigned char* __cptr = reinterpret_cast<const unsigned char*>(__ptr);

Since the FNV primes were carefully chosen in order to minimize
collision rates and maximize dispersion, there might be consequences in
the performance of data structures that build on tr1::hash template,
perhaps tr1::unordered_map and tr1::unordered_set.

Here is a short program exposing the bug:

#include <iostream>
#include <tr1/unordered_map>

int main()
{
    std::cout << std::tr1::_Fnv_hash_base<4>::hash("\x80", 1)
        << " computed, 2232128415 expected\n";

}

and here is the output of g++ -v ...:

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/i486-linux-gnu/4.7/lto-wrapper
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.7.2-5' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs --enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.7 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --enable-targets=all --with-arch-32=i586 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.7.2 (Debian 4.7.2-5) 
COLLECT_GCC_OPTIONS='-O' '-Wextra' '-Wall' '-ansi' '-pedantic' '-v' '-save-temps' '-fno-strict-aliasing' '-fwrapv' '-shared-libgcc' '-mtune=generic' '-march=i586'
 /usr/lib/gcc/i486-linux-gnu/4.7/cc1plus -E -quiet -v -imultiarch i386-linux-gnu -D_GNU_SOURCE fnv-bad.cc -mtune=generic -march=i586 -ansi -Wextra -Wall -pedantic -fno-strict-aliasing -fwrapv -O -fpch-preprocess -o fnv-bad.ii
ignoring nonexistent directory "/usr/local/include/i386-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/i486-linux-gnu/4.7/../../../../i486-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/4.7
 /usr/include/c++/4.7/i486-linux-gnu
 /usr/include/c++/4.7/backward
 /usr/lib/gcc/i486-linux-gnu/4.7/include
 /usr/local/include
 /usr/lib/gcc/i486-linux-gnu/4.7/include-fixed
 /usr/include/i386-linux-gnu
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-O' '-Wextra' '-Wall' '-ansi' '-pedantic' '-v' '-save-temps' '-fno-strict-aliasing' '-fwrapv' '-shared-libgcc' '-mtune=generic' '-march=i586'
 /usr/lib/gcc/i486-linux-gnu/4.7/cc1plus -fpreprocessed fnv-bad.ii -quiet -dumpbase fnv-bad.cc -mtune=generic -march=i586 -auxbase fnv-bad -O -Wextra -Wall -pedantic -ansi -version -fno-strict-aliasing -fwrapv -o fnv-bad.s
GNU C++ (Debian 4.7.2-5) version 4.7.2 (i486-linux-gnu)
        compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version 3.1.0-p10, MPC version 0.9
GGC heuristics: --param ggc-min-expand=63 --param ggc-min-heapsize=62264
GNU C++ (Debian 4.7.2-5) version 4.7.2 (i486-linux-gnu)
        compiled by GNU C version 4.7.2, GMP version 5.0.5, MPFR version 3.1.0-p10, MPC version 0.9
GGC heuristics: --param ggc-min-expand=63 --param ggc-min-heapsize=62264
Compiler executable checksum: 62bfd556e00a93e3d7f66f6876d73826
COLLECT_GCC_OPTIONS='-O' '-Wextra' '-Wall' '-ansi' '-pedantic' '-v' '-save-temps' '-fno-strict-aliasing' '-fwrapv' '-shared-libgcc' '-mtune=generic' '-march=i586'
 as -v --32 -o fnv-bad.o fnv-bad.s
GNU assembler version 2.22 (i486-linux-gnu) using BFD version (GNU Binutils for Debian) 2.22
COMPILER_PATH=/usr/lib/gcc/i486-linux-gnu/4.7/:/usr/lib/gcc/i486-linux-gnu/4.7/:/usr/lib/gcc/i486-linux-gnu/:/usr/lib/gcc/i486-linux-gnu/4.7/:/usr/lib/gcc/i486-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/i486-linux-gnu/4.7/:/usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu/:/usr/lib/gcc/i486-linux-gnu/4.7/../../../../lib/:/lib/i386-linux-gnu/:/lib/../lib/:/usr/lib/i386-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/i486-linux-gnu/4.7/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-O' '-Wextra' '-Wall' '-ansi' '-pedantic' '-v' '-save-temps' '-fno-strict-aliasing' '-fwrapv' '-shared-libgcc' '-mtune=generic' '-march=i586'
 /usr/lib/gcc/i486-linux-gnu/4.7/collect2 --sysroot=/ --build-id --no-add-needed --eh-frame-hdr -m elf_i386 --hash-style=both -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu/crt1.o /usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu/crti.o /usr/lib/gcc/i486-linux-gnu/4.7/crtbegin.o -L/usr/lib/gcc/i486-linux-gnu/4.7 -L/usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu -L/usr/lib/gcc/i486-linux-gnu/4.7/../../../../lib -L/lib/i386-linux-gnu -L/lib/../lib -L/usr/lib/i386-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/i486-linux-gnu/4.7/../../.. fnv-bad.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/i486-linux-gnu/4.7/crtend.o /usr/lib/gcc/i486-linux-gnu/4.7/../../../i386-linux-gnu/crtn.o

In attachment, the preprocessed program (.ii).  This problem was also reported to Debian (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=731330).

Best regards,
        g
Comment 1 Jonathan Wakely 2013-12-06 13:02:35 UTC
We're not really making any non-critical changes to TR1 code now.
Comment 2 g1 2013-12-06 15:38:32 UTC
(In reply to Jonathan Wakely from comment #1)
> We're not really making any non-critical changes to TR1 code now.

Yet std::_Fnv_hash_bytes in non-TR1 code has the same problem (lines 116 and 161 of svn://gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/libsupc++/hash_bytes.cc at revision 205748).  The following program emits 83079839 instead of the expected FNV-1a result.

#include <iostream>
#include <unordered_map>
int main() { 
    std::cout << std::_Fnv_hash_bytes("\x80", 1, 2166136261UL) << '\n';
}

Don't know if those functions are used somewhere or are leftovers.  The comments in the file suggest that Murmur hash is used in the std::hash template instead of FNV-1a.
Comment 3 Jonathan Wakely 2015-03-31 14:34:02 UTC
(In reply to g1 from comment #2)
> Don't know if those functions are used somewhere or are leftovers.  The
> comments in the file suggest that Murmur hash is used in the std::hash
> template instead of FNV-1a.

They're just leftovers retained for backwards compatibility in applications that need the old hashing algorithm, so changing them is probably not a good idea as it would not be backwards compatible.

I'll add a comment to libsupsc++/hash_bytes.cc saying it's not really FNV-1a but I don't think we should change it now.

Thanks for pointing it out though.
Comment 4 Jonathan Wakely 2016-11-15 20:18:11 UTC
Author: redi
Date: Tue Nov 15 20:17:39 2016
New Revision: 242454

URL: https://gcc.gnu.org/viewcvs?rev=242454&root=gcc&view=rev
Log:
PR 59406 note that FNV hash functions are incorrect

	PR libstdc++/59406
	* include/bits/functional_hash.h: Add comment noting difference from
	FNV-1a.
	* include/tr1/functional_hash.h: Likewise.
	* libsupc++/hash_bytes.cc: Likewise.

Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/functional_hash.h
    trunk/libstdc++-v3/include/tr1/functional_hash.h
    trunk/libstdc++-v3/libsupc++/hash_bytes.cc
Comment 5 Jonathan Wakely 2016-11-15 20:18:25 UTC
Comments added to the code on trunk. Closing as WONTFIX.