This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch, Fortran ] PR 50016: Slow Fortran I/O on Windows and flushing/_commit


On Mon, Oct 17, 2011 at 23:52, Janne Blomqvist
<blomqvist.janne@gmail.com> wrote:
> On Mon, Oct 17, 2011 at 19:03, Tobias Burnus <burnus@net-b.de> wrote:
>> Hi Janne,
>>
>> On 10/17/2011 05:30 PM, Janne Blomqvist wrote:
>>>
>>> On Mon, Oct 17, 2011 at 15:49, Tobias Burnus<burnus@net-b.de> Âwrote:
>>>>
>>>> This patch adds a call to _commit() on _WIN32 for the FLUSH subroutine
>>>> and
>>>> the FLUSH statement. It removes the _commit from gfortran's buf_flush.
>>>
>>> Like I argued in this message
>>> http://gcc.gnu.org/ml/fortran/2011-10/msg00094.html, I think this is a gross
>>> mistake.
>>
>> [...]
>>
>> And I think it is a mistake to not make the data available to other
>> processes as it is indicated by the Fortran 2008 standard:
>>
>> "Execution of a FLUSH statement causes data written to an external le to be
>> available to other processes, or causes data placed in an external file by
>> means other than Fortran to be available to a READ statement. These actions
>> are processor dependent."
>>
>> Thus, I think it makes sense for FLUSH to call _commit on Windows.
>
> I'm not actually sure we can draw such conclusions. What we know is
> that metadata updates to the directory are delayed. It wouldn't
> surprise me if opening a file (which presumably is a atomic operation
> in order to avoid race conditions just like on POSIX) forces the
> kernel to sync metadata of other handles to the same file.

I did some further googling, and

http://stackoverflow.com/questions/2883691/fflush-on-stdout

seems to suggest that the my explanation above is roughly what is
happening. That is, opening and closing the file in another program,
"type" in the link above, flushes the metadata to the directory.  Also
from  the MSDN link referenced there

"The only guarantee about a file timestamp is that the file time is
correctly reflected when the handle that makes the change is closed. "

(which would suggest that the same hold for other file metadata as
well, such as the size).


Explanation of file caching in Windows:
http://msdn.microsoft.com/en-us/library/aa364218%28v=VS.85%29.aspx

Some MS presentation about how to make apps behave nicely with SMB:
http://mschnlnine.vo.llnwd.net/d1/pdc08/PPTX/ES23.pptx

Under the heading of "Platform Support for Metadata Caching":

"Metadata caching is best effort and there
are very limited consistency guarantees

Metadata caches expire after a fixed time
"

(Though it's not entirely clear if the above "platform support" means
only SMB or Windows filesystem semantics in general.)

So, I think the picture is essentially:

- write() (which is a wrapper around WriteFile(EX)) transfers data and
metadata to the system cache (kernel page cache in unix terminology).

- However, the metadata is written to the directory lazily, allowing
applications that do stat() (or the equivalent Win32 API call(s)) on a
pathname to see stale data. This, incidentally, is what gfortran is
doing and what caused the issue that led to the introduction of
_commit in the first place.

- Closing the file, or _commit() (a wrapper around
FlushFileBuffers()), or (per the stackoverflow link above)
opening/closing the file in another process will force the directory
flush immediately.

Some investigation into how other language support libraries handle this:

- For MS-DOS compatibility (back when OS file caching wasn't that
advanced, one presumes), the MS C runtime allows the user to link in
an extra object COMMODE.OBJ which makes fflush() also call _commit()
recreating the MS-DOS behavior. By default, however, this is not done.
Also, there are some nonstandard flags that can be passed to fopen()
to indicate that one wants the MS-DOS behavior.

- For the MS C++ compiler, the same COMMODE.OBJ linking can be done,
and there is some nonstandard extension allowing one to get the fd
from a C++ stream and then call _commit on it. By default flush() on a
stream does not call _commit/FlushFileBuffers, it just does a
WriteFile().

- For .NET, there is the FileStream.Flush() method, which flushes the
user-space buffer without calling _commit/FlushFileBuffers. In the
latest version of .NET, there is a new FileStream.Flush(bool) method
which can be used to call FlushFileBuffers. In previous .NET versions,
people used PInvoke (the native code calling interface) to call
FlushFileBuffers if needed.

- For the runtimes provided with GCC, grepping the source tree shows
that libgfortran is the only occurence of _commit. In libjava there is
a FlushFileBuffers call, but it's #if 0'ed away.

That is, except for libgfortran, no other language runtime by default
makes an effort to synchronize the metadata.

In conclusion, I still think that my previous patch which got rid of
the _commit and the reliance on using stat() via pathname is the
correct approach. "dir" might show stale data for the file size, but
this seems to be the norm on Windows, and opening the file in another
process will give the correct data. So in practice I don't think there
will be any problems from getting rid of _commit.

-- 
Janne Blomqvist


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]