This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Patch, libgfortran, 4.5] PR25561, 37754: New low level I/O library

From: Janne Blomqvist <blomqvist dot janne at gmail dot com>
To: gfortran <fortran at gcc dot gnu dot org>, gcc-patches <gcc-patches at gcc dot gnu dot org>
Date: Tue, 06 Jan 2009 00:10:02 +0200
Subject: [Patch, libgfortran, 4.5] PR25561, 37754: New low level I/O library

Hi all,

attached is a substantially reworked low level I/O library for gfortran.
The general idea is to replace the "Alloc Stream Facility" with a simple
low level interface that provides more or less exactly the POSIX
semantics (read/write/seek/truncate/close). This is provided by the
raw_* functions in unix.c. Then there is another implementation of the
same interface in the buf_* functions, which as the name implies, use a
buffer to improve performance. Attached is gfortranio.rst with some more
justification for the design choices.

The patch also changes the mid level I/O library to use the format
buffer (fbuf) machinery for reads as well as writes (for 4.4 we use fbuf
for writes).

Some other minor cleanup have been done as well, e.g. changing
end-of-file detection to "lazy", i.e. it assumes everything is going
well until a read operation hits EOF (and generates an error at that
point) rather than trying to pro-actively determine that the next read
will hit EOF.

So far I have done very little performance tuning, mainly to make
fbuf_getc() an inline function that calls a fbuf_getc_refill non-inline
function to refill the buffer if necessary (a bit like getc() and
fgetc() in C stdio). But that being said, initial performance results
are promising. For the countlines.f benchmark (see PR37754) with the
patched trunk vs. 4.3 (I don't have vanilla 4.4 at the moment, but
according to the PR it's about 10-20 % slower than 4.3):

./countlines.gf44  2.90s user 0.08s system 98% cpu 3.022 total

./countlines.gf43  4.58s user 0.07s system 99% cpu 4.684 total

(results above are best of 10 runs each)

Attached is also a simple test that measures unformatted sequential
write performance with different record sizes. The new implementation
does quite well because the new buffering implementation is good about
reducing unnecessary syscalls like lseek(). Compared with strace e.g. to
gfortran 4.3 that seems to do an lseek() for every record, the patched
version avoids seeking as long as the record + record markers are small
enough to fit into the buffer (currently the buffer is 8 KB like for
previous gfortran versions). For the patched 4.4 a representative run
with the us_perf shows the following on my system:

 Unformatted sequential write performance test
 Record size                 MB/s
 ================================
           4   7.5802262656833790
           8   15.873509918930392
          16   25.410357328092356
          32   39.385463811119422
          64   60.307900379571777
         128   90.606550636531679
         256   124.39827155195042
         512   148.53833492807280
        1024   161.33197189039470
        2048   165.30075386586569
        4096   168.64479316414403
        8192   174.79361499332097
       16384   244.82046680142199
       32768   294.28299059919226
       65536   308.46720850537616
      131072   291.61666653611849
      262144   164.22273724046653
      524288   158.71373187189036

whereas with 4.3

 Unformatted sequential write performance test
 Record size                 MB/s
 ================================
           4   5.3134706580816573
           8   12.652808318693332
          16   16.467640435883737
          32   25.677765810986461
          64   41.438541428006481
         128   69.465166864727536
         256   100.52410645875071
         512   120.68720373185609
        1024   143.79566880191507
        2048   136.48643666988502
        4096   125.20637961368544
        8192   165.25923002902420
       16384   244.82046680142199
       32768   261.98301434601660
       65536   285.61729202322573
      131072   163.07498967544379
      262144   166.28830213878092
      524288   150.22626328225456

With records bigger than 8 KB there is no big difference between the
two, as in that case both 4.3 and trunk+patch bypasses the buffering.

The patch passes the gfortran and NIST testsuites on i686-pc-linux-gnu.
For NIST FM907 there is a single extra whitespace change compared to the
reference, but I don't think it's actually an error.

So far, however, there has been very little real application testing.

As this is a rather large and invasive patch, I'd like to get it in
relatively early in 4.5 in order to have plenty of time to fix all
remaining issues.

Ok for trunk once 4.4 branches?

-- 
Janne Blomqvist

2009-01-05  Janne Blomqvist  <jb@gcc.gnu.org>

        PR libfortran/25561 libfortran/37754
	* io/io.h (struct stream): Define new stream interface function
	pointers, and inline functions for accessing it.
	(struct fbuf): Use int instead of size_t, remove flushed element.
	(mem_alloc_w): New prototype.
	(mem_alloc_r): New prototype.
	(stream_at_bof): Remove prototype.
	(stream_at_eof): Remove prototype.
	(file_position): Remove prototype.
	(flush): Remove prototype.
	(stream_offset): Remove prototype.
	(unit_truncate): New prototype.
	(read_block_form): Change to return pointer, int* argument.
	(hit_eof): New prototype.
	(fbuf_init): Change prototype.
	(fbuf_reset): Change prototype.
	(fbuf_alloc): Change prototype.
	(fbuf_flush): Change prototype.
	(fbuf_seek): Change prototype.
	(fbuf_read): New prototype.
	(fbuf_getc_refill): New prototype.
	(fbuf_getc): New inline function.
        * io/fbuf.c (fbuf_init): Use int, get rid of flushed.
	(fbuf_debug): New function.
	(fbuf_reset): Flush, and return position offset.
	(fbuf_alloc): Simplify, don't flush, just realloc.
	(fbuf_flush): Make usable for read mode, salvage remaining bytes.
	(fbuf_seek): New whence argument.
	(fbuf_read): New function.
	(fbuf_getc_refill): New function.
	* io/file_pos.c (formatted_backspace): Use new stream interface.
	(unformatted_backspace): Likewise.
	(st_backspace): Make sure format buffer is reset, use new stream
	interface, use unit_truncate.
	(st_endfile): Likewise.
	(st_rewind): Likewise.
	* io/intrinsics.c: Use new stream interface.
	* io/list_read.c (push_char): Don't use u.p.scratch, use realloc
	to resize.
	(free_saved): Don't check u.p.scratch.
	(next_char): Use new stream interface, use fbuf_getc() for external files.
	(finish_list_read): flush format buffer.
	(nml_query): Update to use modified interface:s
	* io/open.c (test_endfile): Use new stream interface.
	(edit_modes): Likewise.
	(new_unit): Likewise, set bytes_left to 1 for stream files.
	* io/read.c (read_l): Use new read_block_form interface.
	(read_utf8): Likewise.
	(read_utf8_char1): Likewise.
	(read_default_char1): Likewise.
	(read_utf8_char4): Likewise.
	(read_default_char4): Likewise.
	(read_a): Likewise.
	(read_a_char4): Likewise.
	(read_decimal): Likewise.
	(read_radix): Likewise.
	(read_f): Likewise.
	* io/transfer.c (read_sf): Use fbuf_read and mem_alloc_r, remove
	usage of u.p.line_buffer.
	(read_block_form): Update interface to return pointer, use
	fbuf_read for direct access.
	(read_block_direct): Update to new stream interface.
	(write_block): Use mem_alloc_w for internal I/O.
	(write_buf): Update to new stream interface.
	(formatted_transfer_scalar): Don't use u.p.line_buffer, use
	fbuf_seek for external files.
	(us_read): Update to new stream interface.
	(us_write): Likewise.
	(data_transfer_init): Always check if we switch modes and flush.
	(skip_record): Use new stream interface, fix comparison.
	(next_record_r): Check for and reset u.p.at_eof, use new stream
	interface, use fbuf_getc for spacing.
	(write_us_marker): Update to new stream interface, don't inline.
	(next_record_w_unf): Likewise.
	(sset): New function.
	(next_record_w): Use new stream interface, use fbuf for printing
	newline.
	(next_record): Use new stream interface.
	(finalize_transfer): Remove sfree call, use new stream interface.
	(st_iolength_done): Don't use u.p.scratch.
	(st_read): Don't check for end of file.
	(st_read_done): Don't use u.p.scratch, use unit_truncate.
	(hit_eof): New function.
	* io/unit.c (init_units): Always init fbuf for formatted units.
	(update_position): Use new stream interface.
	(unit_truncate): New function.
	(finish_last_advance_record): Use fbuf to print newline.
	* io/unix.c: Remove unused SSIZE_MAX macro.
	(BUFFER_SIZE): Make static const variable rather than macro.
	(struct unix_stream): Remove dirty_offset, len, method,
	small_buffer. Order elements by decreasing size.
	(struct int_stream): Remove.
	(move_pos_offset): Remove usage of dirty_offset.
	(reset_stream): Remove.
	(do_read): Rename to raw_read, update to match new stream
	interface.
	(do_write): Rename to raw_write, update to new stream interface.
	(raw_seek): New function.
	(raw_tell): New function.
	(raw_truncate): New function.
	(raw_close): New function.
	(raw_flush): New function.
	(raw_init): New function.
	(fd_alloc): Remove.
	(fd_alloc_r_at): Remove.
	(fd_alloc_w_at): Remove.
	(fd_sfree): Remove.
	(fd_seek): Remove.
	(fd_truncate): Remove.
	(fd_sset): Remove.
	(fd_read): Remove.
	(fd_write): Remove.
	(fd_close): Remove.
	(fd_open): Remove.
	(fd_flush): Rename to buf_flush, update to new stream interface
	and unix_stream.
	(buf_read): New function.
	(buf_write): New function.
	(buf_seek): New function.
	(buf_tell): New function.
	(buf_truncate): New function.
	(buf_close): New function.
	(buf_init): New function.
	(mem_alloc_r_at): Rename to mem_alloc_r, change prototype.
	(mem_alloc_w_at): Rename to mem_alloc_w, change prototype.
	(mem_read): Change to match new stream interface.
	(mem_write): Likewise.
	(mem_seek): Likewise.
	(mem_tell): Likewise.
	(mem_truncate): Likewise.
	(mem_close): Likewise.
	(mem_flush): New function.
	(mem_sfree): Remove.
	(empty_internal_buffer): Cast to correct type.
	(open_internal): Use correct type, init function pointers.
	(fd_to_stream): Test whether to open file as buffered or raw.
	(output_stream): Remove mode set.
	(error_stream): Likewise.
	(flush_all_units_1): Use new stream interface.
	(flush_all_units): Likewise.
	(stream_at_bof): Remove.
	(stream_at_eof): Remove.
	(file_position): Remove.
	(file_length): Update logic to use stream interface.
	(flush): Remove.
	(stream_offset): Remove.
	* io/write.c (write_utf8_char4): Use int instead of size_t.
	(write_x): Extra safety check.
	(namelist_write_newline): Use new stream interface.

! Test performance of unformatted sequential with different sized records.
! Janne Blomqvist 2009
program us_perf
  implicit none
  integer, parameter :: d = 8
  integer :: ii
  real(d) :: wspeed

  print *, 'Unformatted sequential write performance test'
  print *, 'Record size                 MB/s'
  print *, '================================'
  ii = 1
  do
     call run_us_test (ii, wspeed)
     print *, ii*4, wspeed
     if (ii > 100000) then
        exit
     end if
     ii = ii * 2
  end do

contains
  subroutine run_us_test (n, ws)
    integer, intent(in) :: n
    real(d), intent(out) :: ws
    integer, allocatable :: data(:)
    real(d) :: t1, t2
    integer :: ii, loops
    integer, parameter :: nsize = 10000000 ! 10 MB

    ! Write nsize * log(n + 1) bytes,  each record is n elements of 4 bytes each
    ! + two 4 byte record markers
    loops = nsize * log(n + 1._d) / (n*4._d + 8._d)

    allocate(data(n))
    data = 123
    open(10, file="usperf.dat", form='unformatted', access='sequential', status='replace')
    call cpu_time(t1)
    do ii = 1, loops
       write (10) data
    end do
    call cpu_time(t2)
    deallocate(data)
    close(10)
    ws = nsize * log(n+1._d) / 1024**2 / (t2-t1)
  end subroutine run_us_test
end program us_perf

==========================
 GFortran new I/O library
==========================

Introduction
============

This document specifies the design of the new GFortran I/O library (as
of 4.5?), and the motivation behind it.

Problems with the current I/O library
=====================================

The current and original I/O library is based on a design called the
``Alloc Stream Facility`` (ASF). ASF was designed to explore if
"normal" file I/O could be done more efficiently using memory mapping
(the mmap() syscall and friends). By using an ASF style API an
application can avoid copying the I/O buffer between user space and
kernel space. However, in the case of gfortran I/O, the Fortran I/O
model is a traditional read/write model, so the buffer copy has to be
made anyway. Memory mapping also has overhead in that the OS has to
set up the mapping, and on page faults map pages from files into
memory etc. Unfortunately this page mapping performance has remained
relatively constant, while the performance of buffer copying has
increased with increased memory bandwidth. 

Also, there are issues with mmap relating to pipes, changing size of
files etc., so the ASF library must have a traditional read/write
implementation as a fallback. For gfortran it was actually found that
this fallback implementation was much faster than the mmap one, so the
mmap implementation was deleted and gfortran now does all I/O via
read/write syscalls.

But the main problem with the GFortran I/O library is really a lack of
clear layering in the code, and a lack of defined semantics. An
example of this lack of layering is that formatting uses the ASF
buffer directly, making the buffering code complicated and hence a lot
of bugs have been papered over by flushing the buffer.

Also see PR25561_.

Proposed solution
=================

The I/O library should have a clear separation between the
components. For formatting, a formatting buffer is needed in order to
properly handle things like the T* edit descriptors (not all files are
seekable, e.g. terminals), and it's clearer to have a separate format
buffer that is flushed to the I/O buffer whose only purpose is to
enhance performance. This will cause an extra memcpy(), but this
should not have a big effect on formatted I/O due to all other
overhead, and also a simpler I/O buffer will facilitate optimizations
in that layer.

Problems with stdio
-------------------

An initial version that replaced the ASF with C stdio was made, and
except for a number of regressions it worked. However, there are a
number of problems with C stdio that have led me to conclude that we
cannot rely on it (see also page 16 of Python3slides_):

- fseek() causing a flush(). This is strictly speaking not mandated by
  the standard, but glibc (an important platform for gfortran) always
  does this. Unfortunately, this is an important use case of gfortran,
  due to the way unformatted sequential I/O is implemented. Always
  flushing the buffers when seeking leads to poor performance for
  small records with unformatted sequential.

- No control of buffering. For example, how to find out how big the
  file is without flushing, seeking to the end, and seeking back?
  Also, if gfortran wants to support some form of parallel I/O as part
  of Co-Array Fortran, libgfortran needs to control the interaction of
  buffering and file locking (fcntl()).

- Text mode issues. Gfortran currently handles line ending conversions
  and Unicode internally, and thus switching to stdio functionality
  for this would require lots of changes. While using stdio in binary
  mode perhaps would work, this is probably undefined. Also, stdio
  generally doesn't handle 'universal newline', i.e. the ability to
  read CR, CRLF, LF on all platforms, so even with stdio text mode,
  gfortran I/O library would need to scan the input for the universal
  newlines.

- In order to use large files, we need to use the POSIX fseeko()
  rather than stdio fseek(), so using stdio is really no more portable
  than POSIX. Also, we need ftruncate() and other POSIX functions
  anyway, so not requiring POSIX is not feasible in any case.

- stdio doesn't support truncate. It's probably safe to first
  fflush(), then ftrunctate(), but there is no guarantee that this
  will work.

- Asynchronous I/O (AIO). If Gfortran wants to support F2003 using
  proper OS functionality (aio_read(), aio_write() etc.), stdio cannot
  be used.

- Various combinations of OPEN specifiers require things that are not
  possible with stdio fopen(), so one needs to open files with POSIX
  open(), then figure out a compatible stdio mode, and fdopen() the
  POSIX fd. This is kludgy.

Proposed design
---------------

The low level design is roughly based on PEP3116_, a stackable design,
similar to Python 3, Perl, and Java, or for that matter the gfortran
I/O library when it still had both read/write and mmap
implementations. The idea is to provide a consistent interface, and
different implementations providing the same semantics. In general, it
can be implemented similar to the current I/O library, i.e. a struct
with the required data and function pointers to the required
functions.

- Raw I/O. This is a basic wrapper around a few POSIX I/O system
  calls. This has the benefit that the semantics are described by the
  POSIX standard, and any book that deals with POSIX programming. Any
  deviation from POSIX is hence a bug. Raw I/O should provide read(),
  write(), seek(), tell(), truncate(), and close(). It can also
  provide a few extra stuff providing information about the stream,
  such as seekable(), readable(), writeable(), fileno() (the POSIX
  file descriptor) etc. 

- Buffered I/O. This behaves exactly like Raw I/O above, but as the
  name implies it uses a buffer in order to provide better
  performance. It provides the same functions as Raw I/O, plus a
  flush() function to flush the buffer.

- Text I/O and Text Buffered I/O. These are like Raw and Buffered I/O,
  but on top of that they provide stuff like line ending and Unicode
  conversions. Also a method like readline() that reads until it hits
  a newline. However, this implementation might not be needed, as the
  current libgfortran formatted I/O code deals with binary I/O
  directly.

Note that due to the identical semantics, raw IO can be used to find
bugs in the buffered IO, as they must behave identically. Raw IO
itself should be rather bugfree, as it's just trivial wrappers calling
the syscalls.

Unformatted I/O can be implemeted as currently, calling the raw or
buffered I/O (depending on the GFORTRAN_UNBUFFERED_* environment
variables) functions.

For formatted IO, the format buffer functions (fbuf_*) should call the
underlying buffered/raw IO functions. For internal I/O, it should be
possible to use the fbuf functionality, except that the format buffer
is the character string that is the source/target of the internal
I/O. However the initial implementation tries to avoid touching
internal I/O as much as possible.

.. _PR25561: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25561

.. _PEP3116: http://www.python.org/dev/peps/pep-3116/

.. _Python3slides: http://www.python.org/doc/essays/ppt/accu2006/Py3kACCU.ppt

Attachment: pr25561-part2-8.diff.gz
Description: GNU Zip compressed data

Attachment: signature.asc
Description: OpenPGP digital signature

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]