This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: tr1::unordered_set<double> bizarre rounding behavior (x86)

From: Gabriel Dos Reis <gdr at integrable-solutions dot net>
To: Michael Veksler <VEKSLER at il dot ibm dot com>
Cc: Joe Buck <Joe dot Buck at synopsys dot COM>, gcc at gcc dot gnu dot org
Date: 05 Jul 2005 21:23:06 +0200
Subject: Re: tr1::unordered_set<double> bizarre rounding behavior (x86)
References: <OFC50D7347.1C4339A9-ONC2257035.0066952B-C2257035.0068790B@il.ibm.com>

Michael Veksler <VEKSLER@il.ibm.com> writes:

| Joe Buck <Joe.Buck@synopsys.COM> wrote on 05/07/2005 21:10:25:
| 
| > On Tue, Jul 05, 2005 at 08:05:39PM +0200, Gabriel Dos Reis wrote:
| > > It is definitely a good thing to use the full bits of value
| > > representation if we ever want to make all "interesting" bits part of
| > > the hash value.  For reasonable or sane representations it suffices to
| > > get your hand on the object representation, e.g.:
| > >
| > >    const int objsize = sizeof (double);
| > >    typedef unsigned char objrep_t[objsize];
| > >    double x = ....;
| > >    objrep_t& p = reintepret_cast<objrep_t&>(x);
| > >    // ...
| > >
| > > and let frexp and friends only for less obvious value representation.
| >
| > I disagree; on an ILP32 machine, we pull out only 32 bits for the hash
| > value, and if you aren't careful, your approach will wind up using the
| > least significant bits of the mantissa.  This will cause all values that
| > are exactly representable as floats to collide.
| 
| For that you can do something like (or templated equivalent):
| namespace Impl
| {
|  template <class T>
|  size_t floating_point_hash(T in)
|  {
|    if (sizeof(in) <= sizeof(size_t))
|     Use Gaby's solution, with zero padding;
|    else
|     frexp and friends using Joe Buck's ideas;
|   }
| }
| 
| Gaby's solution should be done with care - to avoid any
| aliasing issues (never go directly from double& to size_t&).

The standard explicilty permit that you can regard any object as an
array of unsigned char.  Given that, and given no padding bits
(e.g. the "sane" representation assumption), hashing any object larger
than a size_t is no different from hashing a character string.

Now, the question is how to make sure we do not have padding bits.  For
most targets, that assumption is OK; only the one subject of this
discussion seems to pose problems ;-) 

| Both Gaby's and Joe Buck's solutions do not take
| the strangeness of IEEE (NNN?) into account.
| As I remember it (I don't have the reference at home),
| IEEE FP has many bit-representations for NaN, each
| containing some bit-encoding of errors.

My proposal explicilty takes that into account in the sense that it
looks at all bits of the value representation, therefore the encoding
bits of the NaNs too.


| "There *should* be a specialization for equal_to<double> that
| provides a strict weak ordering for NaNs as well as other
| values." [quoted forwarded mail from P.J. Plauger]
| Doing bit-wise conversions will not address this requirement.

I do not understand what you mean by that.

-- Gaby

References:
- Re: tr1::unordered_set<double> bizarre rounding behavior (x86)
  - From: Michael Veksler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]