This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[v3] Return of the son of fp printing


Hi all.  I finally got some time, got off my tush, and dusted off the floating
point printing patch I had been working on quite some time ago.  I see that my
timing is a bit off as far 3.4 goes, but this version I think is ready to go.

This patch is built against mainline and passes with no regressions.

Originally, I had extracted the floatconv code from libio in gcc 2.95.  This
turned out to need a bunch of configury work that I wasn't up to speed on
doing and the patch stalled.  Also, there were speed improvements, but not
uniform.  On 686, the libio code is 2-3X faster than printf for reasonable
numbers - if the precision is low enough that the computations can be done in
fp hardware, or if we're dealing with integers.  However, for printing that
required big integers, it's 50% slower.  Glibc uses a version of GMP under the
covers for its printing and it seemed like too much work and licensing hassle
to bring it over.

So, recently, I thought that if I stripped down the libio code to only the
fast path portions and fall back to printf for other floats, I could get all
the speed benefits, simplify the code a lot, and avoid the configury issues.

It seems to have been successful.  The fp printing performance test shows:

before:
ofstream_insert_float.cc 	precision 6              	6964r 6534u   79s        0mem    0pf 
ofstream_insert_float.cc 	precision 12             	6836r 6470u   68s        0mem    0pf 

after:
ofstream_insert_float.cc 	precision 6              	2914r 2703u   80s        0mem    0pf 
ofstream_insert_float.cc 	precision 12             	3710r 3281u   58s        0mem    0pf 

Here I'm getting 2-2.5x performance improvement.  The worst case will be the
same as the printf code.

So, the only things this patch now relies on non-standard are:

__builtin_isinf
__builtin_isnan
__builtin_signbit

Everything else is standard C++.  Another bonus is that this code should work
for s370, VAX, and c4x fp, as long as the underlying functions are supported.

Another improvement here is that the patch safely uses sprintf, and only
requires one call, since we know ahead of time how large the result can be.

In addition, I avoid setting the global locale to C when using sprintf.  The
only thing that can be different in fp output from sprintf is the decimal
point.  So I use localeconv()->decimal_point to scan and convert the character
rather than mess with the global locale.  Faster solutions are welcome, but
even as is, this will help us eventually eliminate this wart in the library.

Once this patch is in place, the only remaining calls to convert_from_v are in
money_put.  I've already got plans to replace this with inline printing for
small enough integers, and the remainder can be handled with similar
restricted use of sprintf.

The biggest issue with this patch is that I don't know if adding the exported
symbols to 3.4.2 is the right thing to do.  Wiser heads, let me know.

Jerry


2004-08-29  Jerry Quinn  jlquinn@optonline.net

	* config/linker-map.gnu (GLIBCXX_3.4.2): Add dec_exponent,
        float_to_char_sprintf, dtoa.
	* include/bits/locale_facets.tcc: Include cmath.
	(__float_to_char_fixed, __float_to_char_scientific, __float_to_char):
        New.
	(__dec_exponent, __dtoa, __float_to_char_sprintf): Declare.
	(num_put::_M_insert_float): Use __float_to_char instead of
        __convert_from_v.
	* Makefile.am (sources):  Add floatconv.cc.
	* Makefile.in: Regenerate.
	* src/floatconv.cc.


Index: config/linker-map.gnu
===================================================================
RCS file: /cvs/gcc/gcc/libstdc++-v3/config/linker-map.gnu,v
retrieving revision 1.67
diff -u -r1.67 linker-map.gnu
--- config/linker-map.gnu	8 Jul 2004 05:24:33 -0000	1.67
+++ config/linker-map.gnu	29 Aug 2004 22:22:26 -0000
@@ -259,6 +259,10 @@
     _ZN9__gnu_cxx11__pool_base16_M_get_free_listE[jm];
     _ZN9__gnu_cxx11__pool_base12_M_get_mutexEv;
 
+    _ZSt14__dec_exponentI[de]EiT_RiRb;
+    _ZSt23__float_to_char_sprintfI[de]EiPcT_iiSt13_Ios_Fmtflags;
+    _ZSt6__dtoaI[de]EPcS0_T_iiibPi;
+
 } GLIBCXX_3.4.1;
 
 # Symbols in the support library (libsupc++) have their own tag.
Index: include/bits/locale_facets.tcc
===================================================================
RCS file: /cvs/gcc/gcc/libstdc++-v3/include/bits/locale_facets.tcc,v
retrieving revision 1.202
diff -u -r1.202 locale_facets.tcc
--- include/bits/locale_facets.tcc	25 Aug 2004 23:38:28 -0000	1.202
+++ include/bits/locale_facets.tcc	29 Aug 2004 22:22:27 -0000
@@ -36,6 +36,7 @@
 #pragma GCC system_header
 
 #include <limits>		// For numeric_limits
+#include <cmath>		// For log10
 #include <typeinfo>		// For bad_cast.
 #include <bits/streambuf_iterator.h>
 
@@ -1037,16 +1038,254 @@
       __len = __newlen;
     }
 
-  // The following code uses snprintf (or sprintf(), when
-  // _GLIBCXX_USE_C99 is not defined) to convert floating point values
-  // for insertion into a stream.  An optimization would be to replace
-  // them with code that works directly on a wide buffer and then use
-  // __pad to do the padding.  It would be good to replace them anyway
-  // to gain back the efficiency that C++ provides by knowing up front
-  // the type of the values to insert.  Also, sprintf is dangerous
-  // since may lead to accidental buffer overruns.  This
-  // implementation follows the C++ standard fairly directly as
-  // outlined in 22.2.2.2 [lib.locale.num.put]
+  // Prints fixed point number to sbuf
+  //
+  // atoms contains numeric and exponent wide chars to use for the result.
+  // dpchar is the decimal point char to use.
+  //
+  // The number consists of buf and decp.  buf contains the sigificant digits
+  // of the number with no trailing 0's.  decp is the position of the decimal
+  // point.  It is the number of digits in buf to the left of the decimal
+  // point.
+  //
+  // prec specifies the maximum number of digits to print after the decimal
+  // point.  flags is provided for showpoint.  showpoint prints a point even
+  // if there are no more digits to print after the point.
+  //
+  // pad is true if exactly prec digits are to be printed after the point.  If
+  // there are not enough, 0's are added to the end.
+  template<typename _CharT>
+    int
+    __float_to_char_fixed(_CharT* __sbuf, const _CharT* __atoms,
+			  _CharT __dpchar, const char* __buf, int __decp, 
+			  int __prec, ios_base::fmtflags __flags, bool __pad)
+    {
+      _CharT* __sb = __sbuf;
+      const _CharT* __digits = __atoms + __num_base::_S_odigits;
+      bool __showpoint = __flags & ios_base::showpoint;
+
+      // Print integer portion
+      if (__decp > 0)
+	{
+	  while (*__buf && __decp) {
+	    *__sbuf++ = __digits[*__buf++ - '0'];
+	    __decp--;
+	  }
+	  // Fill with 0's if we ran out of digits.  If there were enough, decp
+	  // will be 0.
+	  for (; __decp; __decp--)
+	    *__sbuf++ = __digits[0];
+	}
+      else
+	// Leading 0 for fractional only numbers and zero.
+	*__sbuf++ = __digits[0];
+
+      // Print decimal point
+      // Showpoint always prints point.  Otherwise when prec > 0, there must be
+      // digits to print or forced padding.
+      if ((__prec && (__pad || *__buf))	|| __showpoint)
+	*__sbuf++ = __dpchar;
+
+
+      // Print 0's between decimal point and first significant digit
+      if (__decp < 0) {
+	while (__decp < 0 && __prec > 0) {
+	  *__sbuf++ = __digits[0];
+	  __decp++;
+	  __prec--;
+	}
+      }
+
+      // Print significant fractional digits
+      // At this point decp <= 0
+      for (; __prec > 0 && *__buf; __prec--)
+	*__sbuf++ = __digits[*__buf++ - '0'];
+      if (__pad) {
+	while (__prec-- > 0)
+	  *__sbuf++ = __digits[0];
+      }
+
+      return __sbuf - __sb;
+    }
+
+  // Prints scientific format number to sbuf
+  //
+  // Prints a number in scientific format i.e. +-d.ddddE+ee.
+  //
+  // atoms contains numeric and exponent wide chars to use for the result.
+  // dpchar is the decimal point char to use.
+  //
+  // The number consists of buf and decp.
+  // buf contains the sigificant digits of the number with no trailing 0's.
+  // decp is the position of the decimal point.  It is the number of digits
+  // to the left of the decimal point.
+  //
+  // prec specifies the maximum number of digits to print after the decimal
+  // point in scientific form.  showpoint prints a point even if there are no
+  // more digits to print after the point.
+  //
+  // pad is true if exactly prec digits are to be printed after the point.  If
+  // there are not enough, 0's are added to the end.
+  template<typename _CharT>
+    int
+    __float_to_char_scientific(_CharT* __sbuf, const _CharT* __atoms,
+			       _CharT __dpchar, const char* __buf, int __decp, 
+			       int __prec, ios_base::fmtflags __flags, 
+			       bool __pad)
+    {
+      const _CharT* __digits = __atoms + __num_base::_S_odigits;
+      _CharT* __sb = __sbuf;
+      const char* __bp = __buf;
+      bool __showpoint = __flags & ios_base::showpoint;
+
+      // Insert first digit.
+      if (!__buf[0])
+	*__sbuf++ = __digits[0];	// Insert implicit 0.
+      else
+	*__sbuf++= __digits[(*__bp++) - '0'];
+
+      // Insert decimal point.
+      // Showpoint always prints point.  Otherwise when prec > 0, there must be
+      // digits to print or forced padding.
+      if ((__prec && (*__bp || __pad)) || __showpoint)
+	*__sbuf++= __dpchar;
+
+
+      // Insert digits after decimal point
+      for (;__prec && *__bp; __prec--)
+	*__sbuf++ = __digits[*__bp++ - '0'];
+      if (__pad)
+	while (__prec--)
+	  *__sbuf++ = __digits[0];
+
+      // Insert exponent symbol and sign
+      *__sbuf++ = (__flags & ios_base::uppercase) ? __atoms[__num_base::_S_oE]
+	: __atoms[__num_base::_S_oe];
+
+      __decp--;
+      if (__decp < 0) 
+	{
+	  *__sbuf++ = __atoms[__num_base::_S_ominus];
+	  __decp = -__decp;
+	}
+      else 
+	*__sbuf++ = __atoms[__num_base::_S_oplus];
+
+
+      // Write the exponent.  Large enough for ieee quad (15 bit exponent)
+      char __tbuf[6];
+      char *__tb = __tbuf + 5;
+      *__tb-- = 0;
+      while (__decp) 
+	{
+	  *__tb-- = '0' + (__decp % 10);
+	  __decp /= 10;
+	}
+
+      // We need at least 2 digits in exponent.  Pad if it's missing
+      while (__tb - __tbuf > 2) 
+	*__tb-- = '0';
+      while (*++__tb)
+	*__sbuf++ = __digits[*__tb - '0'];
+      return __sbuf - __sb;
+    }
+
+  template<typename _Tp>
+    char* __dtoa(char*, _Tp, int, int, int, bool, int*);
+
+  template<typename _Tp>
+    int __float_to_char_sprintf(char*, _Tp, int, int, ios_base::fmtflags);
+
+  // Convert floating point value to string.
+  //
+  // Takes a floating point value and converts it to a string using dtoa,
+  // falling back to sprintf if needed.
+  //
+  // buf is the wide char buffer to print into, bufsize is the length of buf.
+  //
+  // atoms contains numeric and exponent wide chars to use for the result.
+  // dpchar is the decimal point char to use.
+  //
+  // mode is 0 for %f style, 1 for %e style, and 2 for %g style printing.
+  // v is the value to print, and exp is the exponent estimate from
+  // dec_exponent.
+  //
+  // prec is the precision to use for printing, and is interpreted according
+  // to the mode.
+  //
+  // flags contains the relevant printing flags: showpoint, showpos, and
+  // uppercase.
+  template<typename _CharT, typename _ValueT>
+    inline int
+    __float_to_char(_CharT* __buf, int __bufsize, const _CharT* __atoms,
+		    _CharT __dpchar, int __mode, _ValueT __v, int __exp,
+		    int __prec, ios_base::fmtflags __flags)
+    {
+      char __lbuf[__bufsize];
+      int __decp;
+
+      // Insert sign.
+      int __len = 0;
+      if (__v < 0)
+	{
+	  *__buf++ = __atoms[__num_base::_S_ominus];
+	  __len = 1;
+	}
+      else if (__flags & ios_base::showpos)
+	{
+	  __len = 1;
+	  *__buf++ = __atoms[__num_base::_S_oplus];
+	}
+      
+      // Dispatch number conversion to the correct function.
+      if (__mode == 0)
+	{
+	  if (!__dtoa(__lbuf, __v, __exp, 3, __prec, false, &__decp))
+	    __len += __float_to_char_sprintf(__lbuf, __v, __mode, __prec,
+					     __flags);
+	  else
+	    __len +=__float_to_char_fixed(__buf, __atoms, __dpchar,
+					  &__lbuf[0], __decp, __prec, __flags,
+					  true);
+	}
+      else if (__mode == 1)
+	{
+	  if (!__dtoa(__lbuf, __v, __exp, 2, __prec + 1, false, &__decp))
+	    __len += __float_to_char_sprintf(__lbuf, __v, __mode, __prec,
+					     __flags);
+	  else
+	    __len += __float_to_char_scientific(__buf, __atoms, __dpchar,
+						&__lbuf[0], __decp, __prec,
+					      __flags, true);
+	}
+      else
+	{
+	  if (!__dtoa(__lbuf, __v, __exp, 2, __prec, true, &__decp))
+	    __len += __float_to_char_sprintf(__lbuf, __v, __mode, __prec,
+					     __flags);
+	  else
+	    {
+	      if (__decp <= -4 || __decp > __prec)
+		__len += __float_to_char_scientific(__buf, __atoms, __dpchar,
+						    &__lbuf[0], __decp,
+						    __prec - 1, __flags,
+						    false);
+	      else
+		__len += __float_to_char_fixed(__buf, __atoms, __dpchar, 
+					       &__lbuf[0], __decp,
+					       __prec - __decp, __flags,
+					       false);
+	    }
+	}
+      return __len;
+    }
+
+
+  template<typename _Tp>
+    int
+    __dec_exponent(_Tp d, int& special, bool& sign);
+
+
   template<typename _CharT, typename _OutIter>
     template<typename _ValueT>
       _OutIter
@@ -1071,58 +1310,79 @@
 	// Long enough for the max format spec.
 	char __fbuf[16];
 
-#ifdef _GLIBCXX_USE_C99
-	// First try a buffer perhaps big enough (most probably sufficient
-	// for non-ios_base::fixed outputs)
-	int __cs_size = __max_digits * 3;
-	char* __cs = static_cast<char*>(__builtin_alloca(__cs_size));
-
-	__num_base::_S_format_float(__io, __fbuf, __mod);
-	__len = std::__convert_from_v(__cs, __cs_size, __fbuf, __v,
-				      _S_get_c_locale(), __prec);
-
-	// If the buffer was not large enough, try again with the correct size.
-	if (__len >= __cs_size)
-	  {
-	    __cs_size = __len + 1;
-	    __cs = static_cast<char*>(__builtin_alloca(__cs_size));
-	    __len = std::__convert_from_v(__cs, __cs_size, __fbuf, __v,
-					  _S_get_c_locale(), __prec);
+	int __special;
+	bool __sign;
+	_CharT* __ws;
+	const int __exponent = __dec_exponent(__v, __special, __sign);
+	if (__special)
+	  {
+	    const ctype<_CharT>& __ctype = use_facet<ctype<_CharT> >(__loc);
+	    const char* __cs;
+	    __ws = static_cast<_CharT*>(__builtin_alloca(sizeof(_CharT) * 4));
+	    if (__special == 1)
+	      {
+		if (__sign)
+		  __cs = __io.flags() & ios_base::uppercase ? "-INF" : "-inf";
+		else if (__io.flags() & ios_base::showpos)
+		  __cs = __io.flags() & ios_base::uppercase ? "+INF" : "+inf";
+		else
+		  __cs = __io.flags() & ios_base::uppercase ? "INF" : "inf";
+	      }
+	    else
+	      // special == 2
+	      __cs = __io.flags() & ios_base::uppercase ? "NAN" : "nan";
+	    __len = strlen(__cs);
+	    __ctype.widen(__cs, __cs + __len, __ws);
+      	  }
+	else
+	  {
+	    // Fixed needs space to hold all digits to left of decimal
+	    // point, plus up to __prec digits to the right.  The
+	    // other formats need __prec digits plus sign, decimal
+	    // point, exponent char and sign, and exponent itself.
+	    int __decp;
+	    int __cs_size;
+	    int __mode;
+
+	    if (__io.flags() & ios_base::fixed)
+	      {
+		// %f style
+		// sign + (exp + 1) digits + decpoint + prec digits
+		__cs_size = (__exponent > 0 ? __exponent : 0) + 3 + __prec;
+		__mode = 0;
+	      }
+	    else if (__io.flags() & ios_base::scientific)
+	      {
+		// %e style
+		// sign + digit + decpt + precision + e + sign + explen
+		const int __max_exp_len =
+		  int(log10(numeric_limits<_ValueT>::max_exponent10)) + 1;
+		__cs_size = __prec + 5 + __max_exp_len;
+		__mode = 1;
+	      }
+	    else
+	      {
+		// %g style 
+		// sign + digit + decpt + precision + e + sign + exp (max for neg quad)
+		// When exp >-4 || < prec, use %f style.  We could need 2*prec
+		// digits.
+		const int __max_exp_len =
+		  int(log10(numeric_limits<_ValueT>::max_exponent10)) + 1;
+		if (!__prec) 
+		  __prec = 1;
+		__cs_size = __prec * 2 + 5 + __max_exp_len;
+		__mode = 2;
+	      }
+      
+	    __ws = static_cast<_CharT*>(__builtin_alloca(sizeof(_CharT) * __cs_size));
+	    __len = __float_to_char(__ws, __cs_size, &__lc->_M_atoms_out[0],
+				    __lc->_M_decimal_point, __mode, __v,
+				    __exponent, __prec, __io.flags());
 	  }
-#else
-	// Consider the possibility of long ios_base::fixed outputs
-	const bool __fixed = __io.flags() & ios_base::fixed;
-	const int __max_exp = numeric_limits<_ValueT>::max_exponent10;
-
-	// The size of the output string is computed as follows.
-	// ios_base::fixed outputs may need up to __max_exp + 1 chars
-	// for the integer part + __prec chars for the fractional part
-	// + 3 chars for sign, decimal point, '\0'. On the other hand,
-	// for non-fixed outputs __max_digits * 2 + __prec chars are
-	// largely sufficient.
-	const int __cs_size = __fixed ? __max_exp + __prec + 4
-	                              : __max_digits * 2 + __prec;
-	char* __cs = static_cast<char*>(__builtin_alloca(__cs_size));
-
-	__num_base::_S_format_float(__io, __fbuf, __mod);
-	__len = std::__convert_from_v(__cs, 0, __fbuf, __v,
-				      _S_get_c_locale(), __prec);
-#endif
 
       // [22.2.2.2.2] Stage 2, convert to char_type, using correct
       // numpunct.decimal_point() values for '.' and adding grouping.
-      const ctype<_CharT>& __ctype = use_facet<ctype<_CharT> >(__loc);
-
-      _CharT* __ws = static_cast<_CharT*>(__builtin_alloca(sizeof(_CharT)
-							   * __len));
-      __ctype.widen(__cs, __cs + __len, __ws);
-
-      // Replace decimal point.
-      const _CharT __cdec = __ctype.widen('.');
-      const _CharT __dec = __lc->_M_decimal_point;
-      const _CharT* __p = char_traits<_CharT>::find(__ws, __len, __cdec);
-      if (__p)
-	__ws[__p - __ws] = __dec;
+      // (The conversion happened in __float_to_char(), above.)
 
       // Add grouping, if necessary.
       if (__lc->_M_use_grouping)
@@ -1131,6 +1391,8 @@
 	  // number of digits, but no more.
 	  _CharT* __ws2 = static_cast<_CharT*>(__builtin_alloca(sizeof(_CharT)
 								* __len * 2));
+	  const _CharT* __p = 
+	    char_traits<_CharT>::find(__ws, __len, __lc->_M_decimal_point);
 	  _M_group_float(__lc->_M_grouping, __lc->_M_grouping_size,
 			 __lc->_M_thousands_sep, __p, __ws2, __ws, __len);
 	  __ws = __ws2;
Index: src/Makefile.am
===================================================================
RCS file: /cvs/gcc/gcc/libstdc++-v3/src/Makefile.am,v
retrieving revision 1.144
diff -u -r1.144 Makefile.am
--- src/Makefile.am	16 Apr 2004 19:04:05 -0000	1.144
+++ src/Makefile.am	29 Aug 2004 22:22:28 -0000
@@ -102,6 +102,7 @@
 	ctype.cc \
 	debug.cc \
 	debug_list.cc \
+	floatconv.cc \
 	functexcept.cc \
 	globals_locale.cc \
 	globals_io.cc \
Index: src/Makefile.in
===================================================================
RCS file: /cvs/gcc/gcc/libstdc++-v3/src/Makefile.in,v
retrieving revision 1.208
diff -u -r1.208 Makefile.in
--- src/Makefile.in	23 Aug 2004 10:18:30 -0000	1.208
+++ src/Makefile.in	29 Aug 2004 22:22:28 -0000
@@ -65,7 +65,7 @@
 	numeric_members.lo time_members.lo
 am__objects_2 = basic_file.lo c++locale.lo
 am__objects_3 = allocator.lo codecvt.lo complex_io.lo ctype.lo \
-	debug.lo debug_list.lo functexcept.lo globals_locale.lo \
+	debug.lo debug_list.lo floatconv.lo functexcept.lo globals_locale.lo \
 	globals_io.lo ios.lo ios_failure.lo ios_init.lo ios_locale.lo \
 	limits.lo list.lo locale.lo locale_init.lo locale_facets.lo \
 	localename.lo stdexcept.lo strstream.lo tree.lo \
@@ -309,6 +309,7 @@
 	ctype.cc \
 	debug.cc \
 	debug_list.cc \
+	floatconv.cc \
 	functexcept.cc \
 	globals_locale.cc \
 	globals_io.cc \
Index: src/locale-inst.cc
===================================================================
RCS file: /cvs/gcc/gcc/libstdc++-v3/src/locale-inst.cc,v
retrieving revision 1.50
diff -u -r1.50 locale-inst.cc
--- src/locale-inst.cc	27 Feb 2004 10:12:01 -0000	1.50
+++ src/locale-inst.cc	29 Aug 2004 22:22:28 -0000
@@ -32,6 +32,12 @@
 // ISO C++ 14882: 22.1  Locales
 //
 
+// This file instantiates locale-related classes and functions for char and
+// wchar_t.  Use C in place of the template char type.  wlocale-inst.cc
+// changes the definition of C to reinstantiate everything with wchar_t.  For
+// locale-related functions that don't depend on the char type, see
+// locale-misc-inst.cc.
+
 #include <locale>
 
 // Instantiation configuration.
@@ -313,5 +319,17 @@
     __int_to_char(C*, unsigned long long, const C*, 
 		  ios_base::fmtflags, bool);
 #endif
+
+
+  template
+    int
+    __float_to_char_fixed(C*, const C*, C, const char*, int, int,
+			  ios_base::fmtflags, bool);
+
+  template
+    int
+    __float_to_char_scientific(C*, const C*, C, const char*, int, int,
+			       ios_base::fmtflags, bool);
+
 } // namespace std
--- /dev/null	2004-08-19 18:00:15.000000000 -0400
+++ src/floatconv.cc	2004-08-27 23:38:41.000000000 -0400
@@ -0,0 +1,691 @@
+// Numeric printing support -*- C++ -*-
+
+// Copyright (C) 1993, 1994, 2004
+// Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 2, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING.  If not, write to the Free
+// Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307,
+// USA.
+
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+// Original copyright below
+
+/* 
+Copyright (C) 1993, 1994 Free Software Foundation
+
+This file is part of the GNU IO Library.  This library is free
+software; you can redistribute it and/or modify it under the
+terms of the GNU General Public License as published by the
+Free Software Foundation; either version 2, or (at your option)
+any later version.
+
+This library is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this library; see the file COPYING.  If not, write to the Free
+Software Foundation, 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+
+As a special exception, if you link this library with files
+compiled with a GNU compiler to produce an executable, this does not cause
+the resulting executable to be covered by the GNU General Public License.
+This exception does not however invalidate any other reasons why
+the executable file might be covered by the GNU General Public License. */
+
+// #include <libioP.h>
+/****************************************************************
+ *
+ * The author of this software is David M. Gay.
+ *
+ * Copyright (c) 1991 by AT&T.
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose without fee is hereby granted, provided that this entire notice
+ * is included in all copies of any software which is or includes a copy
+ * or modification of this software and in all copies of the supporting
+ * documentation for such software.
+ *
+ * THIS SOFTWARE IS BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED
+ * WARRANTY.  IN PARTICULAR, NEITHER THE AUTHOR NOR AT&T MAKES ANY
+ * REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY
+ * OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.
+ *
+ ***************************************************************/
+
+/* Some cleaning up by Per Bothner, bothner@cygnus.com, 1992, 1993.
+   Re-written to not need static variables
+   (except result, result_k, HIWORD, LOWORD). */
+
+/* Note that the checking of _DOUBLE_IS_32BITS is for use with the
+   cross targets that employ the newlib ieeefp.h header.  -- brendan */
+
+/* Please send bug reports to
+        David M. Gay
+        AT&T Bell Laboratories, Room 2C-463
+        600 Mountain Avenue
+        Murray Hill, NJ 07974-2070
+        U.S.A.
+        dmg@research.att.com or research!dmg
+ */
+
+/* strtod for IEEE-, VAX-, and IBM-arithmetic machines.
+ *
+ * This strtod returns a nearest machine number to the input decimal
+ * string (or sets errno to ERANGE).  With IEEE arithmetic, ties are
+ * broken by the IEEE round-even rule.  Otherwise ties are broken by
+ * biased rounding (add half and chop).
+ *
+ * Inspired loosely by William D. Clinger's paper "How to Read Floating
+ * Point Numbers Accurately" [Proc. ACM SIGPLAN '90, pp. 92-101].
+ *
+ * Modifications:
+ *
+ *      1. We only require IEEE, IBM, or VAX double-precision
+ *              arithmetic (not IEEE double-extended).
+ *      2. We get by with floating-point arithmetic in a case that
+ *              Clinger missed -- when we're computing d * 10^n
+ *              for a small integer d and the integer n is not too
+ *              much larger than 22 (the maximum integer k for which
+ *              we can represent 10^k exactly), we may be able to
+ *              compute (d*10^k) * 10^(e-k) with just one roundoff.
+ *      3. Rather than a bit-at-a-time adjustment of the binary
+ *              result in the hard case, we use floating-point
+ *              arithmetic to determine the adjustment to within
+ *              one bit; only in really hard cases do we need to
+ *              compute a second residual.
+ *      4. Because of 3., we don't need a large table of powers of 10
+ *              for ten-to-e (just some small tables, e.g. of 10^k
+ *              for 0 <= k <= 22).
+ */
+
+/*
+ * #define IEEE for IEEE floating-point arithmetic
+ * #define IBM for IBM mainframe-style floating-point arithmetic.
+ * #define VAX for VAX-style floating-point arithmetic.
+ * #define Unsigned_Shifts if >> does treats its left operand as unsigned.
+ * #define Check_FLT_ROUNDS if FLT_ROUNDS can assume the values 2 or 3.
+ */
+
+#include <clocale>
+#include <cstring>
+#include <cmath>		// log, floor, frexp
+#include <limits>
+#include <ios>
+
+namespace std
+{
+
+// Bletch = (highest power of 2 < DBL_MAX_10_EXP) / 16
+// Quick_max = floor((mant_size-1)*log(FLT_RADIX)/log(10) - 1)
+// Int_max = floor(mant_size*log(FLT_RADIX)/log(10) - 1)
+template<typename _T>
+struct fptype
+{
+  typedef std::numeric_limits<_T> nld;
+
+  static const int mant_size = nld::digits;
+  static const int Bletch;
+  static const int Quick_max;
+  static const int Int_max;
+};
+
+template<typename _T>
+const int fptype<_T>::Bletch = (1 << __builtin_ffs(fptype::nld::max_exponent10)) / 16;
+
+template<typename _T>
+const int fptype<_T>::Quick_max
+  = int(floor((fptype::nld::digits-1) * log(double(fptype::nld::radix))/log(10) - 1));
+
+// For secondary fast codepath
+template<typename _T>
+const int fptype<_T>::Int_max
+  = int(floor(fptype::nld::digits * log(fptype::nld::radix)/log(10) - 1));
+
+template
+struct fptype<double>;
+
+template
+struct fptype<long double>;
+
+
+// Ten_pmax = floor(mant_size*log(2)/log(5))
+// IEEE & IBM double is 22
+// VAX double is 24
+// IEEE quad is 48
+// This is the number of elements in tens[] to use.
+static const int Ten_pmax = int(fptype<double>::mant_size * log(2)/log(5));
+
+// Enough entries to handle long double with 112-bit mantissa
+static const double
+tens[] = {
+                1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9,
+                1e10, 1e11, 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19,
+                1e20, 1e21, 1e22, 1e23, 1e24, 1e25, 1e26, 1e27, 1e28, 1e29,
+                1e30, 1e31, 1e32, 1e33, 1e34, 1e35, 1e36, 1e37, 1e38, 1e39,
+                1e40, 1e41, 1e42, 1e43, 1e44, 1e45, 1e46, 1e47, 1e48, 1e49
+};
+static const double bigtens[] = { 1e16, 1e32, 1e64, 1e128, 1e256, 1e512,
+				  1e1024, 1e2048, 1e4096 };
+// 5 for ieee double
+// 3 for s370 double
+// 2 for vax double
+static const int n_bigtens =
+  int(log(std::numeric_limits<double>::max_exponent10)/log(2.0)) - 3;
+
+static const long double bigtensl[] = { 1e16L, 1e32L, 1e64L, 1e128L, 1e256L, 1e512L,
+					1e1024L, 1e2048L, 1e4096L, 1e8192L, 1e16384L };
+static const int n_bigtensl =
+  int(log(std::numeric_limits<long double>::max_exponent10)/log(2.0)) - 3;
+
+
+template<typename _T>
+_T bigten(int i);
+
+template<>
+static inline double bigten<double>(int i) { return bigtens[i]; }
+
+template<>
+static inline long double bigten<long double>(int i) { return bigtensl[i]; }
+
+template<typename _T>
+_T maxbigten();
+
+template<>
+static inline double maxbigten<double>()
+{ return bigtens[n_bigtens-1]; }
+
+template<>
+static inline long double maxbigten<long double>()
+{ return bigtensl[n_bigtensl-1]; }
+
+
+
+// dec_exponent performs the first portion of the conversion to a string.  It
+// computes an upper bound on the decimal exponent that can be used to size a
+// buffer to print into.  It must be passed back into dtoa.
+//
+// This will only be used for double and long double
+template<typename _T>
+int
+__dec_exponent(_T d, int &special, bool &sign)
+{
+  /*     Arguments ndigits, decpt, sign are similar to those
+	 of ecvt and fcvt; trailing zeros are suppressed from
+	 the returned string.  If not null, *rve is set to point
+	 to the end of the return value.  If d is +-Infinity or NaN,
+	 then *decpt is set to 9999.
+
+	 special returns true if the number is nan, infinity, or 0.
+	 type contains -1 for nan, 0 for 0, and 1 for infinity.  This allows
+	 the calling routine to handle the special cases.
+  */
+
+  int i;
+  int k;			// Holds base 10 exponent
+
+  _T d2, ds;
+
+  // Handle NAN.
+  if (__builtin_isnan(d))
+    {
+      special = 2;
+      return 3;
+    }
+
+  // Save sign value into user var
+  if (__builtin_signbit(d)) {
+    // set sign for everything, including 0's and NaNs
+    sign = true;
+    d = -d;		// Clear sign bit.
+  }
+  else
+    sign = false;
+
+  // Handle IEEE and VAX infinity and nan.  Apparently IBM has none.
+  if (__builtin_isinf(d))
+    {
+      special = 1;
+      return 8;
+    }
+
+  special = 0;
+
+  if (d == 0.0) return 0;
+
+  // Extract base-2 exponent and normalized fraction.
+  d2 = frexp(d, &i);
+  
+  // Compute integer base 10 exponent into k
+
+  /* log(x)       ~=~ log(1.5) + (x-1.5)/1.5
+   * log10(x)      =  log(x) / log(10)
+   *              ~=~ log(1.5)/log(10) + (x-1.5)/(1.5*log(10))
+   * log10(d) = i*log(2)/log(10) + log10(d2)
+   *
+   * This suggests computing an approximation k to log10(d) by
+   *
+   * k = i*0.301029995663981
+   *      + ( (d2-1.5)*0.289529654602168 + 0.176091259055681 );
+   *
+   * We want k to be too large rather than too small.
+   * The error in the first-order Taylor series approximation
+   * is in our favor, so we just round up the constant enough
+   * to compensate for any error in the multiplication of
+   * (i) by 0.301029995663981; since |i| <= 1077,
+   * and 1077 * 0.30103 * 2^-52 ~=~ 7.2e-14,
+   * adding 1e-13 to the constant term more than suffices.
+   * Hence we adjust the constant term to 0.1760912590558.
+   * (We could get a more accurate k by invoking log10,
+   *  but this is probably not worthwhile.)
+   *
+   * (From Richard Henderson)
+   * The comment should be updated for long double.  I guess the
+   * overestimate is still ok, since while |i| <= 16881 is much
+   * larger, 2^-64 and 2^-113 are much smaller.
+   */
+  ds = (d2-1.5)*0.289529654602168 + 0.1760912590558 + i*0.301029995663981;
+  k = (int)ds;
+  if (ds < 0. && ds != k)
+    k--;    // want k = floor(ds)
+  return k;
+}
+
+template
+int
+__dec_exponent<double>(double d, int &special, bool &sign);
+
+template
+int
+__dec_exponent<long double>(long double d, int &special, bool &sign);
+
+/* dtoa for IEEE arithmetic (dmg): convert double to ASCII string.
+ *
+ * Inspired by "How to Print Floating-Point Numbers Accurately" by
+ * Guy L. Steele, Jr. and Jon L. White [Proc. ACM SIGPLAN '90, pp. 92-101].
+ *
+ * Modifications:
+ *      1. Rather than iterating, we use a simple numeric overestimate
+ *         to determine k = floor(log10(d)).  We scale relevant
+ *         quantities using O(log2(k)) rather than O(k) multiplications.
+ *      2. For some modes > 2 (corresponding to ecvt and fcvt), we don't
+ *         try to generate digits strictly left to right.  Instead, we
+ *         compute with fewer bits and propagate the carry if necessary
+ *         when rounding the final digit up.  This is often faster.
+ *      3. Under the assumption that input will be rounded nearest,
+ *         mode 0 renders 1e23 as 1e23 rather than 9.999999999999999e22.
+ *         That is, we allow equality in stopping tests when the
+ *         round-nearest rule will give the same floating-point value
+ *         as would satisfaction of the stopping test with strict
+ *         inequality.
+ *      4. We remove common factors of powers of 2 from relevant
+ *         quantities.
+ *      5. When converting floating-point integers less than 1e16,
+ *         we use floating-point arithmetic rather than resorting
+ *         to multiple-precision integers.
+ *      6. When asked to produce fewer than 15 digits, we first try
+ *         to get by with floating-point arithmetic; we resort to
+ *         multiple-precision integer arithmetic only if we cannot
+ *         guarantee that the floating-point calculation has given
+ *         the correctly rounded result.  For k requested digits and
+ *         "uniformly" distributed input, the probability is
+ *         something like 10^(k-15) that we must resort to the long
+ *         calculation.
+ */
+
+// Slow method requiring big int math has been removed, returning 0.  This is
+// because printf was about 30% faster by comparison.
+//
+// buf is a buffer whose size was computed using dec_exponent(), so as to be
+// large enough to hold the number if needed.
+//
+// exponent is the value computed by dec_exponent
+//
+// The mode is as described below, but only 2 and 3 are still used.
+//
+// Mode 2 ndigits gives the total significant digits to print.
+// Mode 3 ndigits gives the total digits after the decimal point to print.
+//
+// If trim is true, trailing zeros will be trimmed off.
+//
+// decpt returns the actual decimal point location.  It will often be
+// exponent, but may be adjusted in some cases.
+//
+// The return value is 0 if the routine gave up, otherwise it is buf.
+
+template<typename _T>
+char *
+__dtoa(char* buf, _T d, int exponent, int mode, int ndigits, bool trim, int *decpt)
+{
+  /*     Arguments ndigits, decpt, sign are similar to those of ecvt and fcvt;
+	 trailing zeros are suppressed from the returned string.  If d is
+	 +-Infinity or NaN, then *decpt is set to 9999.
+
+	 mode:
+	 0 ==> shortest string that yields d when read in
+	       and rounded to nearest.
+	 1 ==> like 0, but with Steele & White stopping rule;
+	       e.g. with IEEE P754 arithmetic , mode 0 gives
+	       1e23 whereas mode 1 gives 9.999999999999999e22.
+	 2 ==> max(1,ndigits) significant digits.  This gives a
+	       return value similar to that of ecvt, except
+	       that trailing zeros are suppressed.
+	 3 ==> through ndigits past the decimal point.  This
+	       gives a return value similar to that from fcvt,
+	       except that trailing zeros are suppressed, and
+	       ndigits can be negative.
+
+	 4-9 should give the same return values as 2-3, i.e.,
+	       4 <= mode <= 9 ==> same return as mode
+	       2 + (mode & 1).  These modes are mainly for
+	       debugging; often they run slower but sometimes
+	       faster than modes 2-3.
+	 4,5,8,9 ==> left-to-right digit generation.
+	 6-9 ==> don't try fast floating-point estimate
+	 	(if applicable).
+
+	 Values of mode other than 0-9 are treated as mode 0.
+
+	 Sufficient space is allocated to the return value
+	 to hold the suppressed trailing zeros.
+  */
+
+  int i, ilim=0, ilim1=0;
+  int k;			// Holds base 10 exponent
+  int k_check;
+  int L;
+  _T ds, eps;
+  char *s, *s0, *stmp;
+
+  // Handle 0 for all cases.
+  if (d == 0.0) {
+    *decpt = 1;
+    buf[0] = '0';
+    buf[1] = 0;
+    return buf;
+  }
+
+  // Clear the sign.
+  d = fabs(d);
+
+  // Extract base-2 exponent.
+  frexp(d, &i);
+
+  // k is an upper bound on the integer base 10 exponent of d.  We may need to
+  // tweak it a little.
+  k = exponent;
+  k_check = 1;
+  if (k >= 0 && k < Ten_pmax) {
+    if (d < tens[k])
+      k--;
+    k_check = 0;
+  }
+
+  if (mode == 2) {
+    if (ndigits <= 0)
+      ndigits = 1;
+    ilim = ilim1 = i = ndigits;
+  }
+  else {
+    i = ndigits + k + 1;
+    ilim = i;
+    ilim1 = i - 1;
+    if (i <= 0)
+      i = 1;
+  }
+  // i is now an upper bound of the number of digits to generate.
+
+  s = s0 = buf;
+
+  if (ilim >= 0 && ilim <= fptype<_T>::Quick_max) {
+
+    // Try to get by with floating-point arithmetic.
+
+    _T d2 = d;
+    int i = 0;
+    int k0 = k;
+    int ilim0 = ilim;
+    int ieps = 2; // conservative
+    int j1;
+    int j;
+
+    if (k > 0) {
+      ds = tens[k&0xf];		// 1e0 -> 1e15
+      j = k >> 4;
+      if (j & fptype<_T>::Bletch) {
+	// I've never seen this code hit in .5 million random numbers
+	// prevent overflows
+	// n_bigtens
+	j &= fptype<_T>::Bletch - 1;
+	d /= maxbigten<_T>();
+	ieps++;
+      }
+      for(; j; j >>= 1, i++)
+	if (j & 1) {
+	  ieps++;
+	  ds *= bigten<_T>(i);
+	}
+      d /= ds;
+    }
+    else if ((j1 = -k)) {
+      d *= tens[j1 & 0xf];
+      for(j = j1 >> 4; j; j >>= 1, i++)
+	if (j & 1) {
+	  ieps++;
+	  d *= bigten<_T>(i);
+	}
+    }
+    if (k_check && d < 1. && ilim > 0) {
+      if (ilim1 <= 0)
+	goto fast_failed;
+      ilim = ilim1;
+      k--;
+      d *= 10.;
+      ieps++;
+    }
+    eps = ieps*d + 7.;
+    static const _T e2m = ldexp(2, -(fptype<_T>::mant_size-1));
+    eps *= e2m;
+    if (ilim == 0) {
+      d -= 5.;
+      if (d > eps)
+	goto one_digit;
+      if (d < -eps)
+	goto no_digits;
+      goto fast_failed;
+    }
+
+    // Generate ilim digits, then fix them up.
+    eps *= tens[ilim-1];
+    for(i = 1;; i++, d *= 10.) {
+      L = (int)d;
+      d -= L;
+      *s++ = '0' + (int)L;
+      if (i == ilim) {
+	if (d > 0.5 + eps)
+	  goto bump_up;
+	else if (d < 0.5 - eps) {
+	  // Strips off trailing 0's
+	  if (trim) {
+	    while(*--s == '0');
+	    s++;
+	  }
+	  goto ret;
+	}
+	// else fast approach failed
+	break;
+      }
+    }
+  fast_failed:
+    s = s0;
+    d = d2;
+    k = k0;
+    ilim = ilim0;
+  }
+
+  // Do we have a "small" integer?
+
+  if (k <= fptype<_T>::Int_max && floor(d) == d) {
+    // Yes.
+    ds = tens[k];
+    if (ndigits < 0 && ilim <= 0) {
+      if (ilim < 0 || d <= 5*ds)
+	goto no_digits;
+      goto one_digit;
+    }
+    for(i = 1;; i++) {
+      L = (int)(d / ds);
+      d -= L*ds;
+
+      if (std::numeric_limits<_T>::round_style == std::round_toward_infinity ||
+	  std::numeric_limits<_T>::round_style == std::round_toward_neg_infinity) {
+	// If FLT_ROUNDS == 2, L will usually be high by 1
+	if (d < 0) {
+	  L--;
+	  d += ds;
+	}
+      }
+
+      *s++ = '0' + (int)L;
+      if (i == ilim) {
+	d += d;
+	if (d > ds || (d == ds && L & 1)) {
+	bump_up:
+	  stmp = s;
+	  while( *--s == '9' ) {
+	    *s='0';
+	    if (s == s0) {
+	      k++;
+	      *s = '0';
+	      break;
+	    }
+	  }
+	  ++*s++;
+	  if (!trim) s = stmp;
+	}
+	break;
+      }
+      if (!(d *= 10.))
+	break;
+    }
+    goto ret;
+  }
+
+  return 0;			// XXX bail out to avoid mp int calcs
+
+  /* no digits, fcvt style */
+ no_digits:
+  k = -1 - ndigits;
+  goto ret;
+
+ one_digit:
+  *s++ = '1';
+  k++;
+  goto ret;
+
+ ret:
+  *s = 0;
+  *decpt = k + 1;
+  return s0;
+}
+
+template
+char *
+__dtoa<double>(char* buf, double d, int exponent, int mode, int ndigits, bool trim, int *decpt);
+
+template
+char *
+__dtoa<long double>(char* buf, long double d, int exponent, int mode, int ndigits, bool trim, int *decpt);
+
+
+// Call to detect if we're working on double or long double.
+template<typename _T>
+static inline bool __is_long();
+
+template<>
+static inline bool __is_long<double>() { return false; }
+
+template<>
+static inline bool __is_long<long double>() { return true; }
+
+// The last three are in flags.
+template<typename _T>
+int
+__float_to_char_sprintf(char* buf, _T d, int mode, int prec, ios_base::fmtflags flags)
+{
+  static const char lmodes[] = "feg";
+  static const char umodes[] = "FEG";
+
+  bool point = flags & ios_base::showpoint;
+  bool pos = flags & ios_base::showpos;
+  bool up = flags & ios_base::uppercase;
+
+  // Build the format string
+  char fmt[8] = "%";
+  char* fp = &fmt[1];
+  if (pos) *fp++ = '+';
+  if (point) *fp++ = '#';
+  *fp++ = '.';
+  *fp++ = '*';
+
+  if (__is_long<_T>())
+    *fp++ = 'L';
+
+  if (up)
+    *fp++ = umodes[mode];
+  else
+    *fp++ = lmodes[mode];
+
+  *fp=0;
+
+  // This is safe since the buffer will be big enough.
+  sprintf(&buf[0], fmt, prec, d);
+
+  // Standard says we need to print as if in the C locale.  The only
+  // difference that may show up is that the point char can be different.
+  // Change it to '.'.
+  char dp = localeconv()->decimal_point[0];
+  char* bp = buf;
+  for (; *bp; bp++)
+    if (*bp == dp) {
+      *bp = '.';
+      break;
+    }
+  // Return the full length of the string
+  int len = bp - buf;
+  len += strlen(bp);
+  return len;
+}
+
+template
+int
+__float_to_char_sprintf<double>(char* buf, double d, int mode, int prec,
+				ios_base::fmtflags flags);
+
+template
+int
+__float_to_char_sprintf<long double>(char* buf, long double d, int mode,
+				     int prec, ios_base::fmtflags flags);
+
+}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]