This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Four times faster printf %e for num_put


Is really no one interested? This would make std::ios_base::scientific
floating-point iostream inserters 4 to 6 times faster on average
provided the requested number of significant digits is not much larger*
than one that is provided by the floating-point type. This condition is
very likely to be satisfied in the majority of real world use-cases.

*: for double the limit is 21 digits compared to (usual) DBL_DECIMAL_DIG
of 17. Note that only IEEE-754 format is supported.

The current libstdc++ implementation adds roughly 20% overhead over
printf (as can be seen from test results below). This becomes 50% if my
ideas are implemented in libc only, thus I think it's worth to add
optimized versions of the inserters to libstdc++ too.

Test description:

20M doubles are formatted to stdout, stdout is redirected to /dev/null.
The precision is 17 digits, values are exponentially distributed,
exponent range is as noted below, both negative and positive values are
tested. GCC is 4.8.1, libc is eglibc 2.17, all tests are compiled with
-O3 -fno-lto

Legend:

'cf' is improved stream inserter
'libc' is printf
'libcxx' is standard stream inserter
'null_cf' is standard streams with dummy data (IOW I/O overhead)
'null_libc' is C streams with dummy data
'time' is user time spend by the program

All pairwise comparisons are done without subtracting the null_*
numbers. Doing so would result in even larger relative differences.

Test results:

Intel Core 2 T9300 2.5GHz (Westmere), 64-bit mode,
exponent range is e-300 .. e300 (10^-300 .. 10^300)

 cf        time: 4.28
 libc      time: 26.43 (6.2 times slower than cf)
 libcxx    time: 31.27 (7.3 times slower than cf)
 null_cf   time: 0.59
 null_libc time: 1.06

exponent range is e-30 .. e30

 cf        time: 4.12
 libc      time: 16.98 (4.1 times slower than cf)
 libcxx    time: 21.84 (5.3 times slower than cf)
 null_cf   time: 0.64
 null_libc time: 0.99

Intel Core 2 T9300 2.5GHz (Westmere), 32-bit mode,
exponent range is e-300 .. e300

 cf        time: 7.09
 libc      time: 43.30 (6.1 times slower than cf)
 libcxx    time: 49.67 (7.0 times slower than cf)
 null_cf   time: 0.94
 null_libc time: 1.56

exponent range is e-30 .. e30

 cf        time: 6.93
 libc      time: 22.12 (3.2 times slower than cf)
 libcxx    time: 28.40 (4.1 times slower than cf)
 null_cf   time: 0.92
 null_libc time: 1.56

Samsung Exynos 4412 Prime 1.6GHz (ARM Cortex A9):
exponent range is e-300 .. e300

 cf        time: 18.31
 libc      time: 126.53 (6.9 times slower than cf)
 libcxx    time: 144.14 (7.9 times slower than cf)
 null_cf   time: 2.58
 null_libc time: 4.92

exponent range is e-30 .. e30
 cf        time: 17.28
 libc      time: 65.27 (3.8 times slower than cf)
 libcxx    time: 82.64 (4.8 times slower than cf)
 null_cf   time: 2.58
 null_libc time: 4.92

Compiling with -O2 or -O1 does not drastically reduce performance:

Intel Core 2 T9300 2.5GHz (Westmere), 64-bit mode,
exponent range is e-300 .. e300:

-O2:
 cf        time: 4.94
 libc      time: 26.45
 libcxx    time: 31.20
 null_cf   time: 0.57
 null_libc time: 0.98

-O1:
 cf        time: 5.09
 libc      time: 26.74
 libcxx    time: 31.81
 null_cf   time: 0.58
 null_libc time: 1.00

-O0:
 cf        time: 19.55
 libc      time: 26.47
 libcxx    time: 31.41
 null_cf   time: 0.62
 null_libc time: 0.93


Regards,
Povilas


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]