This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Four times faster printf %e for num_put

From: Povilas Kanapickas <povilas at radix dot lt>
To: libstdc++ at gcc dot gnu dot org
Date: Sun, 12 Jan 2014 04:47:15 +0200
Subject: Re: Four times faster printf %e for num_put
Authentication-results: sourceware.org; auth=none
References: <52CB3224 dot 6030008 at radix dot lt>

Is really no one interested? This would make std::ios_base::scientific
floating-point iostream inserters 4 to 6 times faster on average
provided the requested number of significant digits is not much larger*
than one that is provided by the floating-point type. This condition is
very likely to be satisfied in the majority of real world use-cases.

*: for double the limit is 21 digits compared to (usual) DBL_DECIMAL_DIG
of 17. Note that only IEEE-754 format is supported.

The current libstdc++ implementation adds roughly 20% overhead over
printf (as can be seen from test results below). This becomes 50% if my
ideas are implemented in libc only, thus I think it's worth to add
optimized versions of the inserters to libstdc++ too.

Test description:

20M doubles are formatted to stdout, stdout is redirected to /dev/null.
The precision is 17 digits, values are exponentially distributed,
exponent range is as noted below, both negative and positive values are
tested. GCC is 4.8.1, libc is eglibc 2.17, all tests are compiled with
-O3 -fno-lto

Legend:

'cf' is improved stream inserter
'libc' is printf
'libcxx' is standard stream inserter
'null_cf' is standard streams with dummy data (IOW I/O overhead)
'null_libc' is C streams with dummy data
'time' is user time spend by the program

All pairwise comparisons are done without subtracting the null_*
numbers. Doing so would result in even larger relative differences.

Test results:

Intel Core 2 T9300 2.5GHz (Westmere), 64-bit mode,
exponent range is e-300 .. e300 (10^-300 .. 10^300)

 cf        time: 4.28
 libc      time: 26.43 (6.2 times slower than cf)
 libcxx    time: 31.27 (7.3 times slower than cf)
 null_cf   time: 0.59
 null_libc time: 1.06

exponent range is e-30 .. e30

 cf        time: 4.12
 libc      time: 16.98 (4.1 times slower than cf)
 libcxx    time: 21.84 (5.3 times slower than cf)
 null_cf   time: 0.64
 null_libc time: 0.99

Intel Core 2 T9300 2.5GHz (Westmere), 32-bit mode,
exponent range is e-300 .. e300

 cf        time: 7.09
 libc      time: 43.30 (6.1 times slower than cf)
 libcxx    time: 49.67 (7.0 times slower than cf)
 null_cf   time: 0.94
 null_libc time: 1.56

exponent range is e-30 .. e30

 cf        time: 6.93
 libc      time: 22.12 (3.2 times slower than cf)
 libcxx    time: 28.40 (4.1 times slower than cf)
 null_cf   time: 0.92
 null_libc time: 1.56

Samsung Exynos 4412 Prime 1.6GHz (ARM Cortex A9):
exponent range is e-300 .. e300

 cf        time: 18.31
 libc      time: 126.53 (6.9 times slower than cf)
 libcxx    time: 144.14 (7.9 times slower than cf)
 null_cf   time: 2.58
 null_libc time: 4.92

exponent range is e-30 .. e30
 cf        time: 17.28
 libc      time: 65.27 (3.8 times slower than cf)
 libcxx    time: 82.64 (4.8 times slower than cf)
 null_cf   time: 2.58
 null_libc time: 4.92

Compiling with -O2 or -O1 does not drastically reduce performance:

Intel Core 2 T9300 2.5GHz (Westmere), 64-bit mode,
exponent range is e-300 .. e300:

-O2:
 cf        time: 4.94
 libc      time: 26.45
 libcxx    time: 31.20
 null_cf   time: 0.57
 null_libc time: 0.98

-O1:
 cf        time: 5.09
 libc      time: 26.74
 libcxx    time: 31.81
 null_cf   time: 0.58
 null_libc time: 1.00

-O0:
 cf        time: 19.55
 libc      time: 26.47
 libcxx    time: 31.41
 null_cf   time: 0.62
 null_libc time: 0.93


Regards,
Povilas

Follow-Ups:
- Re: Four times faster printf %e for num_put
  - From: Paolo Carlini

References:
- Four times faster printf %e for num_put
  - From: Povilas Kanapickas

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]