This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: Four times faster printf %e for num_put
- From: Povilas Kanapickas <povilas at radix dot lt>
- To: libstdc++ at gcc dot gnu dot org
- Date: Sun, 12 Jan 2014 04:47:15 +0200
- Subject: Re: Four times faster printf %e for num_put
- Authentication-results: sourceware.org; auth=none
- References: <52CB3224 dot 6030008 at radix dot lt>
Is really no one interested? This would make std::ios_base::scientific
floating-point iostream inserters 4 to 6 times faster on average
provided the requested number of significant digits is not much larger*
than one that is provided by the floating-point type. This condition is
very likely to be satisfied in the majority of real world use-cases.
*: for double the limit is 21 digits compared to (usual) DBL_DECIMAL_DIG
of 17. Note that only IEEE-754 format is supported.
The current libstdc++ implementation adds roughly 20% overhead over
printf (as can be seen from test results below). This becomes 50% if my
ideas are implemented in libc only, thus I think it's worth to add
optimized versions of the inserters to libstdc++ too.
Test description:
20M doubles are formatted to stdout, stdout is redirected to /dev/null.
The precision is 17 digits, values are exponentially distributed,
exponent range is as noted below, both negative and positive values are
tested. GCC is 4.8.1, libc is eglibc 2.17, all tests are compiled with
-O3 -fno-lto
Legend:
'cf' is improved stream inserter
'libc' is printf
'libcxx' is standard stream inserter
'null_cf' is standard streams with dummy data (IOW I/O overhead)
'null_libc' is C streams with dummy data
'time' is user time spend by the program
All pairwise comparisons are done without subtracting the null_*
numbers. Doing so would result in even larger relative differences.
Test results:
Intel Core 2 T9300 2.5GHz (Westmere), 64-bit mode,
exponent range is e-300 .. e300 (10^-300 .. 10^300)
cf time: 4.28
libc time: 26.43 (6.2 times slower than cf)
libcxx time: 31.27 (7.3 times slower than cf)
null_cf time: 0.59
null_libc time: 1.06
exponent range is e-30 .. e30
cf time: 4.12
libc time: 16.98 (4.1 times slower than cf)
libcxx time: 21.84 (5.3 times slower than cf)
null_cf time: 0.64
null_libc time: 0.99
Intel Core 2 T9300 2.5GHz (Westmere), 32-bit mode,
exponent range is e-300 .. e300
cf time: 7.09
libc time: 43.30 (6.1 times slower than cf)
libcxx time: 49.67 (7.0 times slower than cf)
null_cf time: 0.94
null_libc time: 1.56
exponent range is e-30 .. e30
cf time: 6.93
libc time: 22.12 (3.2 times slower than cf)
libcxx time: 28.40 (4.1 times slower than cf)
null_cf time: 0.92
null_libc time: 1.56
Samsung Exynos 4412 Prime 1.6GHz (ARM Cortex A9):
exponent range is e-300 .. e300
cf time: 18.31
libc time: 126.53 (6.9 times slower than cf)
libcxx time: 144.14 (7.9 times slower than cf)
null_cf time: 2.58
null_libc time: 4.92
exponent range is e-30 .. e30
cf time: 17.28
libc time: 65.27 (3.8 times slower than cf)
libcxx time: 82.64 (4.8 times slower than cf)
null_cf time: 2.58
null_libc time: 4.92
Compiling with -O2 or -O1 does not drastically reduce performance:
Intel Core 2 T9300 2.5GHz (Westmere), 64-bit mode,
exponent range is e-300 .. e300:
-O2:
cf time: 4.94
libc time: 26.45
libcxx time: 31.20
null_cf time: 0.57
null_libc time: 0.98
-O1:
cf time: 5.09
libc time: 26.74
libcxx time: 31.81
null_cf time: 0.58
null_libc time: 1.00
-O0:
cf time: 19.55
libc time: 26.47
libcxx time: 31.41
null_cf time: 0.62
null_libc time: 0.93
Regards,
Povilas