This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Re: Should basic_filebuf rely on posix behavior (on Windows)?
- From: Kai Tietz <ktietz at redhat dot com>
- To: Jonathan Wakely <jwakely at redhat dot com>
- Cc: Luke Allardyce <lukeallardyce at gmail dot com>, libstdc++ at gcc dot gnu dot org
- Date: Mon, 24 Nov 2014 11:22:57 -0500 (EST)
- Subject: Re: Should basic_filebuf rely on posix behavior (on Windows)?
- Authentication-results: sourceware.org; auth=none
- References: <CAFW6PZCYq826MAA7vqUB16C43269vLGJ8PsSN4GYHHgbErR9Jg at mail dot gmail dot com> <20141124113711 dot GD5191 at redhat dot com>
----- UrsprÃngliche Mail -----
> On 24/11/14 19:06 +0900, Luke Allardyce wrote:
> >Behind the scenes, basic_filebuf uses the posix functions read and
> >lseek64 to read and seek the file.
> >
> >This causes an issue on Windows when attempting to seek a text mode
> >file using a value returned by an earlier call to pubseekoff due to
> >the way Windows implements the functions.
> >
> >For files opened in text mode on Windows, read will return the number
> >of bytes copied to the buffer after converting \r\n to \n (it also
> >appends an additional \n to the end of the file). _lseeki64 (which is
> >how mingw-w64 redirects calls to lseek64) however returns the absolute
> >file position without end of line conversion.
> >
> >On a call to pubseekoff (i.e. seekoff), basic_filebuf computes the
> >return value using the remaining characters in the buffer and the file
> >position as returned by lseek64. For text files on Windows this will
> >be off by one for each unconsumed newline in the buffer due to the
> >issue outlined above.
> >
> >As far as I can make out posix doesn't mention text mode for posix
> >file descriptors, so I'm assuming the Windows implementation of read
> >is incorrect as presumably the function should just copy raw bytes;
> >text / binary file distinction is limited to FILEs and calls to fread
> >/ fseek / ftell etc..
>
> It's true that POSIX doesn't mention text mode, but then even for FILE
> stdio it requires text mode and binary mode to be identical, so it's
> not clear what it would require on systems where they are different.
>
> I am inclined to agree that read() should not be doing conversions on
> line-endings but I suppose MS and/or mingw can define it to do
> whatever they want, since mingw is not trying to be POSIX anyway.
True. I don't see here anyway a violation of the POSIX-standard. Even if I admit that MS' C-runtime doesn't intend to be, nevertheless is the translation of LF/LF+CR well documented for Windows platforms. Classical work-a-round is here to use by standard binary-mode. This flag - AFAIR - is defined even for POSIX-systems.
> >Even if it were compliant however, this would mean that no end of line
> >conversion would be performed, which in turn means that basic_filebuf
> >probably shouldn't be using the posix functions for files opened in
> >text mode in the first place if intends to support endline conversion
> >for text files on Windows.
>
> Huh.
????
We might consider to use here binary-mode as default for Windows systems. But your sentence I can follow.
> >The following code demonstrates the issue:
> >
> >#include <fstream>
> >#include <iostream>
> >
> >int main(int, char* argv[])
> >{
> > using traits = std::filebuf::traits_type;
> > using int_type = std::filebuf::int_type;
> >
> > std::filebuf fb;
> > fb.open(argv[1], std::ios::in);
> >
> > for (int_type c; !traits::eq_int_type(c = fb.sbumpc(), traits::eof());)
> > std::cout << fb.pubseekoff(0, std::ios::cur, std::ios::in) << ' ';
> >
> > std::cout << '\n';
> >
> > fb.close();
> > fb.pubsetbuf(nullptr, 0);
> > fb.open(argv[1], std::ios::in);
> >
> > for (int_type c; !traits::eq_int_type(c = fb.sbumpc(), traits::eof());)
> > std::cout << fb.pubseekoff(0, std::ios::cur, std::ios::in) << ' ';
> >}
> >
> >With the following 3-line Windows-style text file:
> >hello
> >world
> >
> >it should produce the following:
> >4 5 6 7 8 9 10 11 12 13 14 15 16
> >1 2 3 4 5 7 8 9 10 11 12 14 16
> >
> >The buffered version reads the entire file into the buffer, as you can
> >see the values returned by pubseekoff cannot be used to seek to that
> >exact same position later on as they will be off by at least one. The
> >unbuffered version works as expected as pubseekoff simply returns the
> >value returned by _lseeki64.
> >
> >Is this something that should / can be corrected in libstdc++, or
> >should my comments be directed to the mingw-w64 team as my
> >understanding of the posix functions is incorrect and this is
> >basically a Windows issue (i.e. lseek64 should be patched)?
>
> I'm not sure. I agree it's a problem but I don't have any bandwidth to
> care about Windows support right now. Maybe Kai will have a more
> useful opinion (he may aleady be aware of the issue).
>
> I suggest reporting it to Bugzilla with your testcase (and maybe a
> link to this mail in the archives). Thanks for bringing it up.
>
>
I would suggest to use here instead of just 'std::ios::in' additionally the binary-mode-flag 'std::ios::in | std::ios::binary'. I tested your sample, and it works nicely.
Kai