This is the mail archive of the gcc-prs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: libstdc++/3720: Problems with num_get


The following reply was made to PR libstdc++/3720; it has been noted by GNATS.

From: Philip Martin <pmartin@uklinux.net>
To: bkoz@gcc.gnu.org
Cc: gcc-bugs@gcc.gnu.org,  gcc-gnats@gcc.gnu.org,  gcc-prs@gcc.gnu.org,
	  schmid@snake.iap.physik.tu-darmstadt.de
Subject: Re: libstdc++/3720: Problems with num_get
Date: 02 Dec 2001 03:07:25 +0000

 bkoz@gcc.gnu.org writes:
 
 >     I've been thinking about this, and its related bug libstdc++/4402.
 >     
 >     I agree this should not core.
 >     
 >     Looking at <limits> it looks like the largest number of digits for any numeric input is around 33 characters. 
 
 33? I don't see why, but I couldn't work out why the current code uses
 32 either. Please describe the longest number if you are going to
 build a constant into the code.
 
 Are there any platforms with 128-bit integers? Octal representation
 could reach 43 non-zero octets, plus a leading zero, a null byte and
 an optional sign.
 
 Floating point input doesn't have to use exponential notation. A
 sequence of digits of length numeric_limits<long double>::max_exponent10
 (which is 4932 on x86) is valid input as far as I am aware, it certainly
 works on 2.95. It is possible that after something like digits10+1 digits
 one could increment a decimal scaling factor and ignore the precise
 digit value.
 
 "Leading" zeros after a decimal point are also a problem
 
    0.00000000000000000000000000000000000000000000000000000000000000000000000001
 
 I want more than the first 33 digits here! As many as -min_exponent10+2 perhaps?
 
 >     
 >     Looking at the relevant part of the standard, I see
 >     
 >     22.2.2.1.2 - num_get virtual functions [lib.facet.num.get.virtuals]
 >     
 >     -9- If discard is true then the position of the character is remembered, but the character is otherwise ignored. If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by stage 1. If so it is accumulated.
 >     
 >     
 >     This is sufficiently vague to allow the following plan, which I consider sensible:
 >     
 >     numbers are extracted while beg != end and while the number of extracted digits <= numeric_limits<_Type>::digits10(). 
 
 digits10 is the number of decimal digits that can be represented
 without change, i.e. the binary may represent more. Thus in general
 digits10+1 digits are needed to be guarantee full binary precision.
 
 However in this case I think octal is the limiting factor. The number
 of decimal digits is
 
    digits10 = digits * log10(2) = digits * 0.30103
 
 while the number of octal octets (lets call it digits8) is larger
 
    digits8  = digits / 3        = digits * 0.33333
 
 At 64 bits digits8 and digits10 differ by two.
 
 For octal you need to extract the leading zero plus up to digits/3+1
 subsequent octal characters (although if digits%3==0, for 48 or 96
 bits say, then digits/3 would suffice).
 
 Hmm, looking at std_limits.h the values appear to be off by one. For
 instance __glibcpp_s32_digits10 = 10 but not all 10 digit numbers can
 be represented using 32 bits.  I think this is a bug in the limits
 file.
 
 
 >     
 >     At that point, we either continue on without setting failbit, or extract one more and set failbit.
 >     
 >     The bit of the standard again:
 >     
 >     If it is not discarded, then a check is made to determine if c is allowed as the next character of an input field of the conversion specifier returned by stage 1. If so it is accumulated.
 >     
 >     To me, this means that one-past-the type-determined size should be extracted, then not accumulated, and failbit should be set. 
 >     
 >     Does this sound reasonable?
 
 It needs to be significantly more complicated for exponential notation
 
    -00..........000123456......2345678............xxx.xxxxxxxxxxxE00000000001234
     <-------------><-----------------><-------------><----------> <--------><-->
      discard these  read these         count these    discard      discard  read
 
 and 
 
    -0.0000..........000123456......2345678............xxxxxxxxxxxE00000000001234
       <---------------><-----------------><---------------------> <--------><-->
        count these      read these         discard                 discard  read
 
 
 What do you do here, given that one can't pass scaling factors around
 without changing the ABI?
 
 Extract and store the "significant" part of the mantissa. Calculate a
 mantissa scaling factor from the additional digits before the decimal
 point or the leading zeros after the decimal point. Extract the
 exponent if present. Now it gets ridiculous: convert the exponent to a
 number, add the mantissa scaling factor to the exponent, convert the
 exponent back to a string!  Then pass this whole new string off to the
 C library, where it gets parsed again.
 
 That assumes I haven't missed some subtle numerical point regarding
 truncating the mantissa and adjusting the exponent. Will that always
 produce the right representation?  Reading and parsing floating-point
 numbers is non-trivial, look at the amount of code that makes up
 strtod in glibc.
 
 To get floating point input up to the level available in 2.95 without
 changing the ABI is *hard*. At present, for serious, robust floating-
 point input with gcc3 I use the C library :-(
 
 > http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&pr=3720&database=gcc
 
 Philip


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]