This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [Patch, Fortran ] PR 50016: Slow Fortran I/O on Windows and flushing/_commit

From: Tobias Burnus <burnus at net-b dot de>
To: Janne Blomqvist <blomqvist dot janne at gmail dot com>
Cc: Kai Tietz <ktietz70 at googlemail dot com>, Jerry DeLisle <jvdelisle2 at gmail dot com>, NightStrike <nightstrike at gmail dot com>, gfortran <fortran at gcc dot gnu dot org>, "gcc-p >> gcc patches" <gcc-patches at gcc dot gnu dot org>
Date: Mon, 17 Oct 2011 18:03:30 +0200
Subject: Re: [Patch, Fortran ] PR 50016: Slow Fortran I/O on Windows and flushing/_commit
References: <Pine.LNX.4.64.1110112040270.6918@digraph.polyomino.org.uk> <4E94B106.60203@net-b.de> <CAO9iq9Hdb9UTGQHZk4+cWEyu7=w8ZzJjz1dcEh14YZe75txNuA@mail.gmail.com> <4E980BAF.7070203@net-b.de> <4E984BF2.4040700@net-b.de> <4E9C2445.2000007@net-b.de> <CAO9iq9Ff6cfaBiGP+-AS0mBk4nTarnHF5zzqscajTPnC0_hn-Q@mail.gmail.com>

Hi Janne,

On 10/17/2011 05:30 PM, Janne Blomqvist wrote:

On Mon, Oct 17, 2011 at 15:49, Tobias Burnus<burnus@net-b.de> wrote:
This patch adds a call to _commit() on _WIN32 for the FLUSH subroutine and
the FLUSH statement. It removes the _commit from gfortran's buf_flush.
Like I argued in this message http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html, I think this is a gross mistake.

[...]

And I think it is a mistake to not make the data available to other processes as it is indicated by the Fortran 2008 standard:

"Execution of a FLUSH statement causes data written to an external le to be available to other processes, or causes data placed in an external file by means other than Fortran to be available to a READ statement. These actions are processor dependent."

Thus, I think it makes sense for FLUSH to call _commit on Windows.

If you don't want to have a slow down: Simply do not call FLUSH.

libgfortran should not require _commit nor fsync in any situation. Those calls are useful for writing databases and other applications which must make data integrity guarantees, and are prepared to pay the performance cost associated with it. It's absolutely not something a language support library should do unless the language spec explicitly requires such data integrity guarantees.

Well, Fortran does not need to write the data to the file, however, the purpose of FLUSH is that I can, e.g., run execute_command_line with the file the program just has written. It will work on Unix/Linux but not on MinGW/MinGW-w64 without a _commit (or without closing the file).

That write() would be buffered on windows makes no sense to me

Why shouldn't it be buffed? Typical Windows programs open files with an exclusive lock and as Windows never had the pipes and many small programs as Unix did, having a per-file-descriptor buffer is easier to implement, avoids multi-thread issues and is potentially faster. If a program wants to make the data available, it can just _commit it or close the file handle - that way one also has a perfect data integrity.

And, while I'm at it, this kind of "relaxed consistency" is not
unheard of in the unix world either. Consider NFS, where data and
metadata may not be flushed to the server until fsync() or close() is
called, or the attribute cache timeout forces the writeout(?), and
thus it's possible for clients to have an inconsistent view of a file.

Well, most of the time it works well on the same system: If I call execute_command_line, the data is up to date. The issue with NFS only occurs if I want to access the data remotely, which is another issue. If one wants to do that, one can use a parallel access with, e.g., HDF5 or MPIv2 or the Coarray TS (to be written and implemented).

In both cases the remedy is the same; if this kind of consistency matters, the user should close the file or fsync()/_commit() before expecting that the OS metadata is consistent. I think that's a better option than sprinkling _commit() all over the library.

No, for the required consistency, FLUSH is enough (including calling _commit on MinGW/MinGW-w64). It makes sure that if the program crashes, the data is still there, it makes the data available for other processes.

Only if one wants to have complete integrity, one can call fsync. However, with NFS, Lustre et al., I am not 100% sure that the data is immediately available on all other clients after fsync returned.

So I would rather prefer my own patch from the URL above. Also, I
think it would be nice if we could get this fix into 4.6.2..

I also would like to see this fixed for 4.6.2. However, a prerequisite is that we agree on how to implement it.

Regarding your patch: I think it does not solve the FLUSH issue. For the file size itself, I think the patch is okay, but frankly, I think for the performance it does not really matter which approach is taken. And I do not like the test-suite part of your patch.

Tobias

Follow-Ups:
- Re: [Patch, Fortran ] PR 50016: Slow Fortran I/O on Windows and flushing/_commit
  - From: Janne Blomqvist

References:
- [Patch, Fortran ] PR 50016: Slow Fortran I/O on Windows and flushing/_commit
  - From: Tobias Burnus
- Re: [Patch, Fortran ] PR 50016: Slow Fortran I/O on Windows and flushing/_commit
  - From: Janne Blomqvist

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]