This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi all, attached is a substantially reworked low level I/O library for gfortran. The general idea is to replace the "Alloc Stream Facility" with a simple low level interface that provides more or less exactly the POSIX semantics (read/write/seek/truncate/close). This is provided by the raw_* functions in unix.c. Then there is another implementation of the same interface in the buf_* functions, which as the name implies, use a buffer to improve performance. Attached is gfortranio.rst with some more justification for the design choices. The patch also changes the mid level I/O library to use the format buffer (fbuf) machinery for reads as well as writes (for 4.4 we use fbuf for writes). Some other minor cleanup have been done as well, e.g. changing end-of-file detection to "lazy", i.e. it assumes everything is going well until a read operation hits EOF (and generates an error at that point) rather than trying to pro-actively determine that the next read will hit EOF. So far I have done very little performance tuning, mainly to make fbuf_getc() an inline function that calls a fbuf_getc_refill non-inline function to refill the buffer if necessary (a bit like getc() and fgetc() in C stdio). But that being said, initial performance results are promising. For the countlines.f benchmark (see PR37754) with the patched trunk vs. 4.3 (I don't have vanilla 4.4 at the moment, but according to the PR it's about 10-20 % slower than 4.3): ./countlines.gf44 2.90s user 0.08s system 98% cpu 3.022 total ./countlines.gf43 4.58s user 0.07s system 99% cpu 4.684 total (results above are best of 10 runs each) Attached is also a simple test that measures unformatted sequential write performance with different record sizes. The new implementation does quite well because the new buffering implementation is good about reducing unnecessary syscalls like lseek(). Compared with strace e.g. to gfortran 4.3 that seems to do an lseek() for every record, the patched version avoids seeking as long as the record + record markers are small enough to fit into the buffer (currently the buffer is 8 KB like for previous gfortran versions). For the patched 4.4 a representative run with the us_perf shows the following on my system: Unformatted sequential write performance test Record size MB/s ================================ 4 7.5802262656833790 8 15.873509918930392 16 25.410357328092356 32 39.385463811119422 64 60.307900379571777 128 90.606550636531679 256 124.39827155195042 512 148.53833492807280 1024 161.33197189039470 2048 165.30075386586569 4096 168.64479316414403 8192 174.79361499332097 16384 244.82046680142199 32768 294.28299059919226 65536 308.46720850537616 131072 291.61666653611849 262144 164.22273724046653 524288 158.71373187189036 whereas with 4.3 Unformatted sequential write performance test Record size MB/s ================================ 4 5.3134706580816573 8 12.652808318693332 16 16.467640435883737 32 25.677765810986461 64 41.438541428006481 128 69.465166864727536 256 100.52410645875071 512 120.68720373185609 1024 143.79566880191507 2048 136.48643666988502 4096 125.20637961368544 8192 165.25923002902420 16384 244.82046680142199 32768 261.98301434601660 65536 285.61729202322573 131072 163.07498967544379 262144 166.28830213878092 524288 150.22626328225456 With records bigger than 8 KB there is no big difference between the two, as in that case both 4.3 and trunk+patch bypasses the buffering. The patch passes the gfortran and NIST testsuites on i686-pc-linux-gnu. For NIST FM907 there is a single extra whitespace change compared to the reference, but I don't think it's actually an error. So far, however, there has been very little real application testing. As this is a rather large and invasive patch, I'd like to get it in relatively early in 4.5 in order to have plenty of time to fix all remaining issues. Ok for trunk once 4.4 branches? -- Janne Blomqvist
2009-01-05 Janne Blomqvist <jb@gcc.gnu.org> PR libfortran/25561 libfortran/37754 * io/io.h (struct stream): Define new stream interface function pointers, and inline functions for accessing it. (struct fbuf): Use int instead of size_t, remove flushed element. (mem_alloc_w): New prototype. (mem_alloc_r): New prototype. (stream_at_bof): Remove prototype. (stream_at_eof): Remove prototype. (file_position): Remove prototype. (flush): Remove prototype. (stream_offset): Remove prototype. (unit_truncate): New prototype. (read_block_form): Change to return pointer, int* argument. (hit_eof): New prototype. (fbuf_init): Change prototype. (fbuf_reset): Change prototype. (fbuf_alloc): Change prototype. (fbuf_flush): Change prototype. (fbuf_seek): Change prototype. (fbuf_read): New prototype. (fbuf_getc_refill): New prototype. (fbuf_getc): New inline function. * io/fbuf.c (fbuf_init): Use int, get rid of flushed. (fbuf_debug): New function. (fbuf_reset): Flush, and return position offset. (fbuf_alloc): Simplify, don't flush, just realloc. (fbuf_flush): Make usable for read mode, salvage remaining bytes. (fbuf_seek): New whence argument. (fbuf_read): New function. (fbuf_getc_refill): New function. * io/file_pos.c (formatted_backspace): Use new stream interface. (unformatted_backspace): Likewise. (st_backspace): Make sure format buffer is reset, use new stream interface, use unit_truncate. (st_endfile): Likewise. (st_rewind): Likewise. * io/intrinsics.c: Use new stream interface. * io/list_read.c (push_char): Don't use u.p.scratch, use realloc to resize. (free_saved): Don't check u.p.scratch. (next_char): Use new stream interface, use fbuf_getc() for external files. (finish_list_read): flush format buffer. (nml_query): Update to use modified interface:s * io/open.c (test_endfile): Use new stream interface. (edit_modes): Likewise. (new_unit): Likewise, set bytes_left to 1 for stream files. * io/read.c (read_l): Use new read_block_form interface. (read_utf8): Likewise. (read_utf8_char1): Likewise. (read_default_char1): Likewise. (read_utf8_char4): Likewise. (read_default_char4): Likewise. (read_a): Likewise. (read_a_char4): Likewise. (read_decimal): Likewise. (read_radix): Likewise. (read_f): Likewise. * io/transfer.c (read_sf): Use fbuf_read and mem_alloc_r, remove usage of u.p.line_buffer. (read_block_form): Update interface to return pointer, use fbuf_read for direct access. (read_block_direct): Update to new stream interface. (write_block): Use mem_alloc_w for internal I/O. (write_buf): Update to new stream interface. (formatted_transfer_scalar): Don't use u.p.line_buffer, use fbuf_seek for external files. (us_read): Update to new stream interface. (us_write): Likewise. (data_transfer_init): Always check if we switch modes and flush. (skip_record): Use new stream interface, fix comparison. (next_record_r): Check for and reset u.p.at_eof, use new stream interface, use fbuf_getc for spacing. (write_us_marker): Update to new stream interface, don't inline. (next_record_w_unf): Likewise. (sset): New function. (next_record_w): Use new stream interface, use fbuf for printing newline. (next_record): Use new stream interface. (finalize_transfer): Remove sfree call, use new stream interface. (st_iolength_done): Don't use u.p.scratch. (st_read): Don't check for end of file. (st_read_done): Don't use u.p.scratch, use unit_truncate. (hit_eof): New function. * io/unit.c (init_units): Always init fbuf for formatted units. (update_position): Use new stream interface. (unit_truncate): New function. (finish_last_advance_record): Use fbuf to print newline. * io/unix.c: Remove unused SSIZE_MAX macro. (BUFFER_SIZE): Make static const variable rather than macro. (struct unix_stream): Remove dirty_offset, len, method, small_buffer. Order elements by decreasing size. (struct int_stream): Remove. (move_pos_offset): Remove usage of dirty_offset. (reset_stream): Remove. (do_read): Rename to raw_read, update to match new stream interface. (do_write): Rename to raw_write, update to new stream interface. (raw_seek): New function. (raw_tell): New function. (raw_truncate): New function. (raw_close): New function. (raw_flush): New function. (raw_init): New function. (fd_alloc): Remove. (fd_alloc_r_at): Remove. (fd_alloc_w_at): Remove. (fd_sfree): Remove. (fd_seek): Remove. (fd_truncate): Remove. (fd_sset): Remove. (fd_read): Remove. (fd_write): Remove. (fd_close): Remove. (fd_open): Remove. (fd_flush): Rename to buf_flush, update to new stream interface and unix_stream. (buf_read): New function. (buf_write): New function. (buf_seek): New function. (buf_tell): New function. (buf_truncate): New function. (buf_close): New function. (buf_init): New function. (mem_alloc_r_at): Rename to mem_alloc_r, change prototype. (mem_alloc_w_at): Rename to mem_alloc_w, change prototype. (mem_read): Change to match new stream interface. (mem_write): Likewise. (mem_seek): Likewise. (mem_tell): Likewise. (mem_truncate): Likewise. (mem_close): Likewise. (mem_flush): New function. (mem_sfree): Remove. (empty_internal_buffer): Cast to correct type. (open_internal): Use correct type, init function pointers. (fd_to_stream): Test whether to open file as buffered or raw. (output_stream): Remove mode set. (error_stream): Likewise. (flush_all_units_1): Use new stream interface. (flush_all_units): Likewise. (stream_at_bof): Remove. (stream_at_eof): Remove. (file_position): Remove. (file_length): Update logic to use stream interface. (flush): Remove. (stream_offset): Remove. * io/write.c (write_utf8_char4): Use int instead of size_t. (write_x): Extra safety check. (namelist_write_newline): Use new stream interface.
! Test performance of unformatted sequential with different sized records. ! Janne Blomqvist 2009 program us_perf implicit none integer, parameter :: d = 8 integer :: ii real(d) :: wspeed print *, 'Unformatted sequential write performance test' print *, 'Record size MB/s' print *, '================================' ii = 1 do call run_us_test (ii, wspeed) print *, ii*4, wspeed if (ii > 100000) then exit end if ii = ii * 2 end do contains subroutine run_us_test (n, ws) integer, intent(in) :: n real(d), intent(out) :: ws integer, allocatable :: data(:) real(d) :: t1, t2 integer :: ii, loops integer, parameter :: nsize = 10000000 ! 10 MB ! Write nsize * log(n + 1) bytes, each record is n elements of 4 bytes each ! + two 4 byte record markers loops = nsize * log(n + 1._d) / (n*4._d + 8._d) allocate(data(n)) data = 123 open(10, file="usperf.dat", form='unformatted', access='sequential', status='replace') call cpu_time(t1) do ii = 1, loops write (10) data end do call cpu_time(t2) deallocate(data) close(10) ws = nsize * log(n+1._d) / 1024**2 / (t2-t1) end subroutine run_us_test end program us_perf
========================== GFortran new I/O library ========================== Introduction ============ This document specifies the design of the new GFortran I/O library (as of 4.5?), and the motivation behind it. Problems with the current I/O library ===================================== The current and original I/O library is based on a design called the ``Alloc Stream Facility`` (ASF). ASF was designed to explore if "normal" file I/O could be done more efficiently using memory mapping (the mmap() syscall and friends). By using an ASF style API an application can avoid copying the I/O buffer between user space and kernel space. However, in the case of gfortran I/O, the Fortran I/O model is a traditional read/write model, so the buffer copy has to be made anyway. Memory mapping also has overhead in that the OS has to set up the mapping, and on page faults map pages from files into memory etc. Unfortunately this page mapping performance has remained relatively constant, while the performance of buffer copying has increased with increased memory bandwidth. Also, there are issues with mmap relating to pipes, changing size of files etc., so the ASF library must have a traditional read/write implementation as a fallback. For gfortran it was actually found that this fallback implementation was much faster than the mmap one, so the mmap implementation was deleted and gfortran now does all I/O via read/write syscalls. But the main problem with the GFortran I/O library is really a lack of clear layering in the code, and a lack of defined semantics. An example of this lack of layering is that formatting uses the ASF buffer directly, making the buffering code complicated and hence a lot of bugs have been papered over by flushing the buffer. Also see PR25561_. Proposed solution ================= The I/O library should have a clear separation between the components. For formatting, a formatting buffer is needed in order to properly handle things like the T* edit descriptors (not all files are seekable, e.g. terminals), and it's clearer to have a separate format buffer that is flushed to the I/O buffer whose only purpose is to enhance performance. This will cause an extra memcpy(), but this should not have a big effect on formatted I/O due to all other overhead, and also a simpler I/O buffer will facilitate optimizations in that layer. Problems with stdio ------------------- An initial version that replaced the ASF with C stdio was made, and except for a number of regressions it worked. However, there are a number of problems with C stdio that have led me to conclude that we cannot rely on it (see also page 16 of Python3slides_): - fseek() causing a flush(). This is strictly speaking not mandated by the standard, but glibc (an important platform for gfortran) always does this. Unfortunately, this is an important use case of gfortran, due to the way unformatted sequential I/O is implemented. Always flushing the buffers when seeking leads to poor performance for small records with unformatted sequential. - No control of buffering. For example, how to find out how big the file is without flushing, seeking to the end, and seeking back? Also, if gfortran wants to support some form of parallel I/O as part of Co-Array Fortran, libgfortran needs to control the interaction of buffering and file locking (fcntl()). - Text mode issues. Gfortran currently handles line ending conversions and Unicode internally, and thus switching to stdio functionality for this would require lots of changes. While using stdio in binary mode perhaps would work, this is probably undefined. Also, stdio generally doesn't handle 'universal newline', i.e. the ability to read CR, CRLF, LF on all platforms, so even with stdio text mode, gfortran I/O library would need to scan the input for the universal newlines. - In order to use large files, we need to use the POSIX fseeko() rather than stdio fseek(), so using stdio is really no more portable than POSIX. Also, we need ftruncate() and other POSIX functions anyway, so not requiring POSIX is not feasible in any case. - stdio doesn't support truncate. It's probably safe to first fflush(), then ftrunctate(), but there is no guarantee that this will work. - Asynchronous I/O (AIO). If Gfortran wants to support F2003 using proper OS functionality (aio_read(), aio_write() etc.), stdio cannot be used. - Various combinations of OPEN specifiers require things that are not possible with stdio fopen(), so one needs to open files with POSIX open(), then figure out a compatible stdio mode, and fdopen() the POSIX fd. This is kludgy. Proposed design --------------- The low level design is roughly based on PEP3116_, a stackable design, similar to Python 3, Perl, and Java, or for that matter the gfortran I/O library when it still had both read/write and mmap implementations. The idea is to provide a consistent interface, and different implementations providing the same semantics. In general, it can be implemented similar to the current I/O library, i.e. a struct with the required data and function pointers to the required functions. - Raw I/O. This is a basic wrapper around a few POSIX I/O system calls. This has the benefit that the semantics are described by the POSIX standard, and any book that deals with POSIX programming. Any deviation from POSIX is hence a bug. Raw I/O should provide read(), write(), seek(), tell(), truncate(), and close(). It can also provide a few extra stuff providing information about the stream, such as seekable(), readable(), writeable(), fileno() (the POSIX file descriptor) etc. - Buffered I/O. This behaves exactly like Raw I/O above, but as the name implies it uses a buffer in order to provide better performance. It provides the same functions as Raw I/O, plus a flush() function to flush the buffer. - Text I/O and Text Buffered I/O. These are like Raw and Buffered I/O, but on top of that they provide stuff like line ending and Unicode conversions. Also a method like readline() that reads until it hits a newline. However, this implementation might not be needed, as the current libgfortran formatted I/O code deals with binary I/O directly. Note that due to the identical semantics, raw IO can be used to find bugs in the buffered IO, as they must behave identically. Raw IO itself should be rather bugfree, as it's just trivial wrappers calling the syscalls. Unformatted I/O can be implemeted as currently, calling the raw or buffered I/O (depending on the GFORTRAN_UNBUFFERED_* environment variables) functions. For formatted IO, the format buffer functions (fbuf_*) should call the underlying buffered/raw IO functions. For internal I/O, it should be possible to use the fbuf functionality, except that the format buffer is the character string that is the source/target of the internal I/O. However the initial implementation tries to avoid touching internal I/O as much as possible. .. _PR25561: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25561 .. _PEP3116: http://www.python.org/dev/peps/pep-3116/ .. _Python3slides: http://www.python.org/doc/essays/ppt/accu2006/Py3kACCU.ppt
Attachment:
pr25561-part2-8.diff.gz
Description: GNU Zip compressed data
Attachment:
signature.asc
Description: OpenPGP digital signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |