[Bug fortran/38199] missed optimization: I/O performance
jb at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Tue May 10 09:55:00 GMT 2011
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38199
--- Comment #13 from Janne Blomqvist <jb at gcc dot gnu.org> 2011-05-10 09:41:08 UTC ---
Here's something for formatted writes; consider the write-many.f (from some
other PR, I'm too lazy to check which now)
program main
open(10,status='SCRATCH')
a = 0.3858204
do i=1,1000000
a = a + 0.4761748164
write(10, '(G12.5)'),a
end do
end program main
Profiling this with 'perf' shows the top offenders as
# Overhead Command
Shared Object Symbol
# ........ ..............
................................................................. ......
#
21.56% write-many /lib/libc-2.11.1.so
[.] __mpn_divrem
14.72% write-many /lib/libc-2.11.1.so
[.] ___printf_fp
13.42% write-many /lib/libc-2.11.1.so
[.] hack_digit.15661
7.75% write-many /lib/libc-2.11.1.so
[.] __GI_vfprintf
3.81% write-many
/home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0 [.]
output_float.isra.7.constprop.16
2.81% write-many
/home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0 [.]
write_float
2.38% write-many /lib/libc-2.11.1.so
[.] _IO_default_xsputn_internal
2.10% write-many
/home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0 [.]
data_transfer_init
1.96% write-many
/home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0 [.]
formatted_transfer
1.37% write-many
/home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0 [.]
next_format0
That is, most of the time seems to be spent somewhere related to the libc
formatting (as we're using snprintf to convert the real numbers to ascii).
Next, consider
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int ndigits = atoi(argv[1]);
printf("Doing test with %d digits\n", ndigits);
size_t bufsz = ndigits + 9;
char *buf = malloc(bufsz);
for (int i = 0; i < 10000000; i++)
snprintf(buf, bufsz, "%#-.*e", ndigits, 1./3);
printf("%s\n", buf);
return 0;
}
$ time ./snprintfbench 0
Doing test with 0 digits
3.e-01
real 0m2.608s
user 0m2.610s
sys 0m0.000s
$ time ./snprintfbench 20
Doing test with 20 digits
3.33333333333333314830e-01
real 0m4.746s
user 0m4.740s
sys 0m0.010s
$ time ./snprintfbench 40
Doing test with 40 digits
3.3333333333333331482961625624739099293947e-01
real 0m6.362s
user 0m6.360s
sys 0m0.000s
$ time ./snprintfbench 60
Doing test with 60 digits
3.333333333333333148296162562473909929394721984863281250000000e-01
real 0m8.155s
user 0m8.160s
sys 0m0.000s
That is, while there is a constant cost for snprintf(), each additional digit
increases the time approximately linearly.
Now, in io/write_float.def we always convert with a constant 41 digits (when
REAL(16) is available). Instead, we could first figure out how many digits we
need, and only then call snprintf(), generating only as many digits as needed.
Or as many as requested + 1, if the user has chosen a non-default rounding
mode, that is we need an extra digit in order to do the rounding in that case.
More information about the Gcc-bugs
mailing list