[Bug fortran/38199] missed optimization: I/O performance

Tue May 10 09:55:00 GMT 2011

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38199

--- Comment #13 from Janne Blomqvist <jb at gcc dot gnu.org> 2011-05-10 09:41:08 UTC ---
Here's something for formatted writes; consider the write-many.f (from some
other PR, I'm too lazy to check which now)

program main
  open(10,status='SCRATCH')
  a = 0.3858204
  do i=1,1000000
     a = a + 0.4761748164
     write(10, '(G12.5)'),a
  end do
end program main

Profiling this with 'perf' shows the top offenders as

# Overhead         Command                                                     
Shared Object  Symbol
# ........  .............. 
.................................................................  ......
#
    21.56%      write-many  /lib/libc-2.11.1.so                                
               [.] __mpn_divrem
    14.72%      write-many  /lib/libc-2.11.1.so                                
               [.] ___printf_fp
    13.42%      write-many  /lib/libc-2.11.1.so                                
               [.] hack_digit.15661
     7.75%      write-many  /lib/libc-2.11.1.so                                
               [.] __GI_vfprintf
     3.81%      write-many 
/home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0  [.]
output_float.isra.7.constprop.16
     2.81%      write-many 
/home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0  [.]
write_float
     2.38%      write-many  /lib/libc-2.11.1.so                                
               [.] _IO_default_xsputn_internal
     2.10%      write-many 
/home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0  [.]
data_transfer_init
     1.96%      write-many 
/home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0  [.]
formatted_transfer
     1.37%      write-many 
/home/janne/src/gfortran/trunk/install/lib64/libgfortran.so.3.0.0  [.]
next_format0

That is, most of the time seems to be spent somewhere related to the libc
formatting (as we're using snprintf to convert the real numbers to ascii).
Next, consider

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
        int ndigits = atoi(argv[1]);
        printf("Doing test with %d digits\n", ndigits);
        size_t bufsz = ndigits + 9;
        char *buf = malloc(bufsz);
        for (int i = 0; i < 10000000; i++)
                snprintf(buf, bufsz, "%#-.*e", ndigits, 1./3);
        printf("%s\n", buf);
        return 0;
}

$ time ./snprintfbench 0
Doing test with 0 digits
3.e-01

real    0m2.608s
user    0m2.610s
sys     0m0.000s

$ time ./snprintfbench 20
Doing test with 20 digits
3.33333333333333314830e-01

real    0m4.746s
user    0m4.740s
sys     0m0.010s

$ time ./snprintfbench 40
Doing test with 40 digits
3.3333333333333331482961625624739099293947e-01

real    0m6.362s
user    0m6.360s
sys     0m0.000s

$ time ./snprintfbench 60
Doing test with 60 digits
3.333333333333333148296162562473909929394721984863281250000000e-01

real    0m8.155s
user    0m8.160s
sys     0m0.000s

That is, while there is a constant cost for snprintf(), each additional digit
increases the time approximately linearly. 

Now, in io/write_float.def we always convert with a constant 41 digits (when
REAL(16) is available). Instead, we could first figure out how many digits we
need, and only then call snprintf(), generating only as many digits as needed.
Or as many as requested + 1, if the user has chosen a non-default rounding
mode, that is we need an extra digit in order to do the rounding in that case.