This is the mail archive of the mailing list for the libstdc++ project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Should basic_filebuf rely on posix behavior (on Windows)?

On 24/11/14 19:06 +0900, Luke Allardyce wrote:
Behind the scenes, basic_filebuf uses the posix functions read and
lseek64 to read and seek the file.

This causes an issue on Windows when attempting to seek a text mode
file using a value returned by an earlier call to pubseekoff due to
the way Windows implements the functions.

For files opened in text mode on Windows, read will return the number
of bytes copied to the buffer after converting \r\n to \n (it also
appends an additional \n to the end of the file). _lseeki64 (which is
how mingw-w64 redirects calls to lseek64) however returns the absolute
file position without end of line conversion.

On a call to pubseekoff (i.e. seekoff), basic_filebuf computes the
return value using the remaining characters in the buffer and the file
position as returned by lseek64. For text files on Windows this will
be off by one for each unconsumed newline in the buffer due to the
issue outlined above.

As far as I can make out posix doesn't mention text mode for posix
file descriptors, so I'm assuming the Windows implementation of read
is incorrect as presumably the function should just copy raw bytes;
text / binary file distinction is limited to FILEs and calls to fread
/ fseek / ftell etc..

It's true that POSIX doesn't mention text mode, but then even for FILE
stdio it requires text mode and binary mode to be identical, so it's
not clear what it would require on systems where they are different.

I am inclined to agree that read() should not be doing conversions on
line-endings but I suppose MS and/or mingw can define it to do
whatever they want, since mingw is not trying to be POSIX anyway.

Even if it were compliant however, this would mean that no end of line
conversion would be performed, which in turn means that basic_filebuf
probably shouldn't be using the posix functions for files opened in
text mode in the first place if intends to support endline conversion
for text files on Windows.


The following code demonstrates the issue:

#include <fstream>
#include <iostream>

int main(int, char* argv[])
 using traits   = std::filebuf::traits_type;
 using int_type = std::filebuf::int_type;

 std::filebuf fb;[1], std::ios::in);

 for (int_type c; !traits::eq_int_type(c = fb.sbumpc(), traits::eof());)
   std::cout << fb.pubseekoff(0, std::ios::cur, std::ios::in) << ' ';

 std::cout << '\n';

 fb.pubsetbuf(nullptr, 0);[1], std::ios::in);

 for (int_type c; !traits::eq_int_type(c = fb.sbumpc(), traits::eof());)
   std::cout << fb.pubseekoff(0, std::ios::cur, std::ios::in) << ' ';

With the following 3-line Windows-style text file:

it should produce the following:
4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 7 8 9 10 11 12 14 16

The buffered version reads the entire file into the buffer, as you can
see the values returned by pubseekoff cannot be used to seek to that
exact same position later on as they will be off by at least one. The
unbuffered version works as expected as pubseekoff simply returns the
value returned by _lseeki64.

Is this something that should / can be corrected in libstdc++, or
should my comments be directed to the mingw-w64 team as my
understanding of the posix functions is incorrect and this is
basically a Windows issue (i.e. lseek64 should be patched)?

I'm not sure. I agree it's a problem but I don't have any bandwidth to
care about Windows support right now.  Maybe Kai will have a more
useful opinion (he may aleady be aware of the issue).

I suggest reporting it to Bugzilla with your testcase (and maybe a
link to this mail in the archives). Thanks for bringing it up.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]