This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: libstdc++/3720: Problems with num_get

From: Philip Martin <pmartin at uklinux dot net>
To: bkoz at gcc dot gnu dot org
Cc: gcc-bugs at gcc dot gnu dot org, gcc-gnats at gcc dot gnu dot org, gcc-prs at gcc dot gnu dot org, schmid at snake dot iap dot physik dot tu-darmstadt dot de
Date: 02 Dec 2001 03:07:25 +0000
Subject: Re: libstdc++/3720: Problems with num_get
References: <20011201051728.3540.qmail@sourceware.cygnus.com>

bkoz@gcc.gnu.org writes:

>     I've been thinking about this, and its related bug libstdc++/4402.
>     
>     I agree this should not core.
>     
>     Looking at <limits> it looks like the largest number of digits for any numeric input is around 33 characters. 

33? I don't see why, but I couldn't work out why the current code uses
32 either. Please describe the longest number if you are going to
build a constant into the code.

Are there any platforms with 128-bit integers? Octal representation
could reach 43 non-zero octets, plus a leading zero, a null byte and
an optional sign.

Floating point input doesn't have to use exponential notation. A
sequence of digits of length numeric_limits<long double>::max_exponent10
(which is 4932 on x86) is valid input as far as I am aware, it certainly
works on 2.95. It is possible that after something like digits10+1 digits
one could increment a decimal scaling factor and ignore the precise
digit value.

"Leading" zeros after a decimal point are also a problem

   0.00000000000000000000000000000000000000000000000000000000000000000000000001

I want more than the first 33 digits here! As many as -min_exponent10+2 perhaps?

>     
>     Looking at the relevant part of the standard, I see
>     
>     22.2.2.1.2 - num_get virtual functions [lib.facet.num.get.virtuals]
>     
>     -9- If discard is true then the position of the character is remembered, but the character is otherwise ignored. If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by stage 1. If so it is accumulated.
>     
>     
>     This is sufficiently vague to allow the following plan, which I consider sensible:
>     
>     numbers are extracted while beg != end and while the number of extracted digits <= numeric_limits<_Type>::digits10(). 

digits10 is the number of decimal digits that can be represented
without change, i.e. the binary may represent more. Thus in general
digits10+1 digits are needed to be guarantee full binary precision.

However in this case I think octal is the limiting factor. The number
of decimal digits is

   digits10 = digits * log10(2) = digits * 0.30103

while the number of octal octets (lets call it digits8) is larger

   digits8  = digits / 3        = digits * 0.33333

At 64 bits digits8 and digits10 differ by two.

For octal you need to extract the leading zero plus up to digits/3+1
subsequent octal characters (although if digits%3==0, for 48 or 96
bits say, then digits/3 would suffice).

Hmm, looking at std_limits.h the values appear to be off by one. For
instance __glibcpp_s32_digits10 = 10 but not all 10 digit numbers can
be represented using 32 bits.  I think this is a bug in the limits
file.

>     
>     At that point, we either continue on without setting failbit, or extract one more and set failbit.
>     
>     The bit of the standard again:
>     
>     If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by stage 1. If so it is accumulated.
>     
>     To me, this means that one-past-the type-determined size should be extracted, then not accumulated, and failbit should be set. 
>     
>     Does this sound reasonable?

It needs to be significantly more complicated for exponential notation

   -00..........000123456......2345678............xxx.xxxxxxxxxxxE00000000001234
    <-------------><-----------------><-------------><----------> <--------><-->
     discard these  read these         count these    discard      discard  read

and 

   -0.0000..........000123456......2345678............xxxxxxxxxxxE00000000001234
      <---------------><-----------------><---------------------> <--------><-->
       count these      read these         discard                 discard  read

What do you do here, given that one can't pass scaling factors around
without changing the ABI?

Extract and store the "significant" part of the mantissa. Calculate a
mantissa scaling factor from the additional digits before the decimal
point or the leading zeros after the decimal point. Extract the
exponent if present. Now it gets ridiculous: convert the exponent to a
number, add the mantissa scaling factor to the exponent, convert the
exponent back to a string!  Then pass this whole new string off to the
C library, where it gets parsed again.

That assumes I haven't missed some subtle numerical point regarding
truncating the mantissa and adjusting the exponent. Will that always
produce the right representation?  Reading and parsing floating-point
numbers is non-trivial, look at the amount of code that makes up
strtod in glibc.

To get floating point input up to the level available in 2.95 without
changing the ABI is *hard*. At present, for serious, robust floating-
point input with gcc3 I use the C library :-(

> http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&pr=3720&database=gcc

Philip

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]