Bug 42857 - std::istream::ignore(std::streamsize n) calls unnecessary underflow
Summary: std::istream::ignore(std::streamsize n) calls unnecessary underflow
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 4.4.1
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 51651 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-01-24 19:46 UTC by Tommi Mäkitalo
Modified: 2020-11-12 23:33 UTC (History)
5 users (show)

See Also:
Host:
Target: i486-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tommi Mäkitalo 2010-01-24 19:46:02 UTC
When ignoring the number of bytes available in a streambuf using std::istream::ignore, ignore calls underflow. This should not happen. I feel the easiest way to reproduce it is to look at strace output when ignoring bytes from ifstream:

#include <iostream>
#include <fstream>

int main(int argc, char* argv[])
{
  std::ifstream in(argv[0]);
  in.get(); // trigger filling of input buffer by calling underflow
  in.ignore(in.rdbuf()->in_avail());  // this triggers another underflow
}

The strace output shows, that 2 calls to read with 8191 bytes occures. When ignoring one byte less and then another byte, only one read is done:

#include <iostream>
#include <fstream>

int main(int argc, char* argv[])
{
  std::ifstream in(argv[0]);
  in.get(); // trigger filling of input buffer
  in.ignore(in.rdbuf()->in_avail() - 1);
  in.ignore(1); // this does not trigger underflow
}
Comment 1 Paolo Carlini 2010-01-27 21:09:52 UTC
Putting aside the strange inconsistency of the second example, which could be easily fixed, and probably should anyway (we have an overload corresponding to n == 1 which calls sbumpc and should probably call snextc instead for a consistent behavior), I think the issue boils down to a very old interpretation issue in these sections of the standard, where it uses wording of the form "any of the following occurs" (the same happens for get, getline, the issue isn't limited to ignore) and it's not clear at all whether, when the target n is reached, thus n characters are extracted, the second termination condition, which involves checking for end-of-file, is still relevant or not, or, in principle could even have been computed first, before checking the value of n. 

In short, I understand your request as a request of evaluating the conditions exactly in the order written in the Standard, and as soon as one holds, the following ones simply ignored. It's not what we have been doing, uniformly (modulo the buglet above) across this area, but indeed makes sense. Note however that a rather noticeable consequence of your interpretation is that ignoring exactly to end-of-line would not set the eofbit anymore in the stream, because we would not do getc on it. Now instead we uniformly set eofbit when it's actually the case (and setting it could even throw, a very noticeable behavior)

Thus, I believe that, besides the buglet above, we should give this issue much more thought, likely it's even too late for 4.5.0, and I'd like to involve Nathan in the discussion, I'm pretty sure I learned from him, some time ago, about the tricky point of the "any of the following occurs"-type specifications.
Comment 2 Paolo Carlini 2010-09-20 12:15:27 UTC
I was having a second look to this issue, and noticed something more which I missed the first time: the Standard, *only* in the case of getline(char_type*, streamsize, char_type) explicitly says "These conditions are tested in the order shown.". In my opinion that means that the get and ignore overloads using the famous "any of the following occurs" can in principle check the conditions in *any* implementation defined order and being conforming. Now, we have a case here where we have an additional underflow because in our implementation we uniformly insist on always checking whether end-of-file occurs in the sequence, thus setting eofbit (besides the special case of ignore(), as already noticed) in that case, like the above mentioned getline does, for example. I want to understand how critical this additional underflow is, performance-wise, which, as far as I can see normally can be triggered only by passing in_avail to ignore, because otherwise, frankly, I find our consistent implementation defined behavior across the various get, getline, ignore overloads pretty nice.
Comment 3 Paolo Carlini 2011-12-22 00:55:04 UTC
*** Bug 51651 has been marked as a duplicate of this bug. ***
Comment 4 Andrew Ayer 2013-02-24 22:05:26 UTC
(In reply to comment #2)
> I want to understand how critical this additional underflow is,
> performance-wise

There are also correctness implications: what if you're trying to ignore
all the bytes you know are available to be read without blocking?  Then
the extra underflow blocks, possibly forever if the program is prevented
from taking action that would result in the underflow completing.
I got bitten by this because I was trying to ignore data from a socket.

Note that the standard has similar language for read(char_type*,
streamsize) ("either of the following conditions") but the library in
effect checks for n characters being reached before checking for eof:
if you read() exactly the number of characters left in the stream,
eofbit is not set.  So the library isn't currently consistent across
all the various istream functions.

I would argue that the current behavior of ignore() is actually contrary
to the standard.  If the standard says "until any" then ignore()
needs to terminate when n characters have been extracted.  But if it
first checks for eof and blocks forever, then it doesn't terminate.
Perhaps this is why getline explicitly mentions the order: in that case
the standard really does want eof to be checked first.
Comment 5 Sergey Zubkov 2014-04-18 16:19:47 UTC
Looking at existing implementations, libstdc++ is the odd one out: LLVM libc++, IBM XL C++, Microsoft Visual Studio, and Oracle libCstd and stlport4 all let me ignore exactly the number of characters requested, without an extra read from the socket.
Comment 6 Andrew Ayer 2014-07-23 21:47:16 UTC
Any word on if this will be fixed in GCC?  To summarize, GCC's current behavior is wrong because:

* Underflowing after ignoring the requested number of bytes could block forever, breaking applications.

* The standard says "characters are extracted until ANY of the following occurs: ... n characters are extracted" (emphasis mine).  If ignore() blocks forever after extracting n characters, it's a violation of the standard because it doesn't terminate as is required.

* GCC's std::basic_istream::read() implementation does NOT underflow after reading the requested number of bytes (even though the standard uses similar language for read and ignore).  This inconsistent behavior is confusing as one would expect read and ignore to behave equivalently.

* Other major C++ implementations don't underflow after ignoring the requested number of bytes. (Thanks, Sergey, for checking this.)

I wrote about this in greater depth at [1] and also wrote my own non-member ignore() which I'm using in place of basic_istream::ignore() [2] which others may find useful until this bug is fixed.

[1] https://www.agwa.name/blog/post/gccs_implementation_of_basicistreamignore_is_broken

[2] https://www.agwa.name/blog/post/gccs_implementation_of_basicistreamignore_is_broken/media/ignore.h
Comment 7 ncm 2020-11-10 23:46:51 UTC
This bug appears not to manifest in g++-10.
Comment 8 Jonathan Wakely 2020-11-11 00:01:17 UTC
Probably changed by one of the patches for PR 94749 or PR 96161, although I still see two reads for the first example.
Comment 9 ncm 2020-11-11 00:06:47 UTC
(In reply to Jonathan Wakely from comment #8)
> Probably changed by one of the patches for PR 94749 or PR 96161, although I
> still see two reads for the first example.

Thank you, I was mistaken. This bug is still present in g++-10.
Comment 10 Jonathan Wakely 2020-11-11 00:24:44 UTC
Untested patch:

--- a/libstdc++-v3/src/c++98/compatibility.cc
+++ b/libstdc++-v3/src/c++98/compatibility.cc
@@ -88,7 +88,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
                        {
                          __sb->__safe_gbump(__size);
                          _M_gcount += __size;
-                         __c = __sb->sgetc();
+                         if (_M_gcount < __n
+                             || __n == __gnu_cxx::__numeric_traits<streamsize>::__max)
+                           __c = __sb->sgetc();
                        }
                      else
                        {


A similar change would be needed in the other versions of this function (the wistream specialization, and the generic version).