[Patch, Fortran ] PR 50016: Slow Fortran I/O on Windows and flushing/_commit

Tobias Burnus burnus@net-b.de
Mon Oct 17 17:08:00 GMT 2011


Hi Janne,

On 10/17/2011 05:30 PM, Janne Blomqvist wrote:
> On Mon, Oct 17, 2011 at 15:49, Tobias Burnus<burnus@net-b.de>  wrote:
>> This patch adds a call to _commit() on _WIN32 for the FLUSH subroutine and
>> the FLUSH statement. It removes the _commit from gfortran's buf_flush.
> Like I argued in this message http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html, I think this is a gross mistake.
[...]

And I think it is a mistake to not make the data available to other 
processes as it is indicated by the Fortran 2008 standard:

"Execution of a FLUSH statement causes data written to an external 
le 
to be available to other processes, or causes data placed in an external 
file by means other than Fortran to be available to a READ statement. 
These actions are processor dependent."

Thus, I think it makes sense for FLUSH to call _commit on Windows.

If you don't want to have a slow down: Simply do not call FLUSH.

> libgfortran should not require _commit nor fsync in any situation. Those calls are useful for writing databases and other applications which must make data integrity guarantees, and are prepared to pay the performance cost associated with it. It's absolutely not something a language support library should do unless the language spec explicitly requires such data integrity guarantees.

Well, Fortran does not need to write the data to the file, however, the 
purpose of FLUSH is that I can, e.g., run execute_command_line with the 
file the program just has written. It will work on Unix/Linux but not on 
MinGW/MinGW-w64 without a _commit (or without closing the file).

> That write() would be buffered on windows makes no sense to me

Why shouldn't it be buffed? Typical Windows programs open files with an 
exclusive lock and as Windows never had the pipes and many small 
programs as Unix did, having a per-file-descriptor buffer is easier to 
implement, avoids multi-thread issues and is potentially faster. If a 
program wants to make the data available, it can just _commit it or 
close the file handle - that way one also has a perfect data integrity.

> And, while I'm at it, this kind of "relaxed consistency" is not
> unheard of in the unix world either. Consider NFS, where data and
> metadata may not be flushed to the server until fsync() or close() is
> called, or the attribute cache timeout forces the writeout(?), and
> thus it's possible for clients to have an inconsistent view of a file.

Well, most of the time it works well on the same system: If I call 
execute_command_line, the data is up to date. The issue with NFS only 
occurs if I want to access the data remotely, which is another issue. If 
one wants to do that, one can use a parallel access with, e.g., HDF5 or 
MPIv2 or the Coarray TS (to be written and implemented).

> In both cases the remedy is the same; if this kind of consistency matters, the user should close the file or fsync()/_commit() before expecting that the OS metadata is consistent. I think that's a better option than sprinkling _commit() all over the library.

No, for the required consistency, FLUSH is enough (including calling 
_commit on MinGW/MinGW-w64). It makes sure that if the program crashes, 
the data is still there, it makes the data available for other processes.

Only if one wants to have complete integrity, one can call fsync. 
However, with NFS, Lustre et al., I am not 100% sure that the data is 
immediately available on all other clients after fsync returned.

> So I would rather prefer my own patch from the URL above. Also, I
> think it would be nice if we could get this fix into 4.6.2..

I also would like to see this fixed for 4.6.2. However, a prerequisite 
is that we agree on how to implement it.

Regarding your patch: I think it does not solve the FLUSH issue. For the 
file size itself, I think the patch is okay, but frankly, I think for 
the performance it does not really matter which approach is taken. And I 
do not like the test-suite part of your patch.

Tobias



More information about the Gcc-patches mailing list