This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Should basic_filebuf rely on posix behavior (on Windows)?


----- UrsprÃngliche Mail -----
> On 24/11/14 19:06 +0900, Luke Allardyce wrote:
> >Behind the scenes, basic_filebuf uses the posix functions read and
> >lseek64 to read and seek the file.
> >
> >This causes an issue on Windows when attempting to seek a text mode
> >file using a value returned by an earlier call to pubseekoff due to
> >the way Windows implements the functions.
> >
> >For files opened in text mode on Windows, read will return the number
> >of bytes copied to the buffer after converting \r\n to \n (it also
> >appends an additional \n to the end of the file). _lseeki64 (which is
> >how mingw-w64 redirects calls to lseek64) however returns the absolute
> >file position without end of line conversion.
> >
> >On a call to pubseekoff (i.e. seekoff), basic_filebuf computes the
> >return value using the remaining characters in the buffer and the file
> >position as returned by lseek64. For text files on Windows this will
> >be off by one for each unconsumed newline in the buffer due to the
> >issue outlined above.
> >
> >As far as I can make out posix doesn't mention text mode for posix
> >file descriptors, so I'm assuming the Windows implementation of read
> >is incorrect as presumably the function should just copy raw bytes;
> >text / binary file distinction is limited to FILEs and calls to fread
> >/ fseek / ftell etc..
> 
> It's true that POSIX doesn't mention text mode, but then even for FILE
> stdio it requires text mode and binary mode to be identical, so it's
> not clear what it would require on systems where they are different.
> 
> I am inclined to agree that read() should not be doing conversions on
> line-endings but I suppose MS and/or mingw can define it to do
> whatever they want, since mingw is not trying to be POSIX anyway.

True.  I don't see here anyway a violation of the POSIX-standard.  Even if I admit that MS' C-runtime doesn't intend to be, nevertheless is the translation of LF/LF+CR well documented for Windows platforms.  Classical work-a-round is here to use by standard binary-mode.  This flag - AFAIR - is defined even for POSIX-systems.
 
> >Even if it were compliant however, this would mean that no end of line
> >conversion would be performed, which in turn means that basic_filebuf
> >probably shouldn't be using the posix functions for files opened in
> >text mode in the first place if intends to support endline conversion
> >for text files on Windows.
> 
> Huh.

????

We might consider to use here binary-mode as default for Windows systems.  But your sentence I can follow.

> >The following code demonstrates the issue:
> >
> >#include <fstream>
> >#include <iostream>
> >
> >int main(int, char* argv[])
> >{
> >  using traits   = std::filebuf::traits_type;
> >  using int_type = std::filebuf::int_type;
> >
> >  std::filebuf fb;
> >  fb.open(argv[1], std::ios::in);
> >
> >  for (int_type c; !traits::eq_int_type(c = fb.sbumpc(), traits::eof());)
> >    std::cout << fb.pubseekoff(0, std::ios::cur, std::ios::in) << ' ';
> >
> >  std::cout << '\n';
> >
> >  fb.close();
> >  fb.pubsetbuf(nullptr, 0);
> >  fb.open(argv[1], std::ios::in);
> >
> >  for (int_type c; !traits::eq_int_type(c = fb.sbumpc(), traits::eof());)
> >    std::cout << fb.pubseekoff(0, std::ios::cur, std::ios::in) << ' ';
> >}
> >
> >With the following 3-line Windows-style text file:
> >hello
> >world
> >
> >it should produce the following:
> >4 5 6 7 8 9 10 11 12 13 14 15 16
> >1 2 3 4 5 7 8 9 10 11 12 14 16
> >
> >The buffered version reads the entire file into the buffer, as you can
> >see the values returned by pubseekoff cannot be used to seek to that
> >exact same position later on as they will be off by at least one. The
> >unbuffered version works as expected as pubseekoff simply returns the
> >value returned by _lseeki64.
> >
> >Is this something that should / can be corrected in libstdc++, or
> >should my comments be directed to the mingw-w64 team as my
> >understanding of the posix functions is incorrect and this is
> >basically a Windows issue (i.e. lseek64 should be patched)?
> 
> I'm not sure. I agree it's a problem but I don't have any bandwidth to
> care about Windows support right now.  Maybe Kai will have a more
> useful opinion (he may aleady be aware of the issue).
> 
> I suggest reporting it to Bugzilla with your testcase (and maybe a
> link to this mail in the archives). Thanks for bringing it up.
> 
> 

I would suggest to use here instead of just 'std::ios::in' additionally the binary-mode-flag 'std::ios::in | std::ios::binary'.  I tested your sample, and it works nicely.

Kai


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]