Bug 93672 - [11/12 Regression] std::basic_istream::ignore hangs if delim MSB is set
Summary: [11/12 Regression] std::basic_istream::ignore hangs if delim MSB is set
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: libstdc++ (show other bugs)
Version: 9.1.0
: P3 normal
Target Milestone: 11.5
Assignee: Jonathan Wakely
URL: https://stackoverflow.com/questions/6...
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2020-02-11 08:49 UTC by Benedek Thaler
Modified: 2024-05-02 14:17 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Known to work: 13.2.1, 14.0, 3.4.6
Known to fail: 11.4.0, 12.3.0, 13.2.0, 14.0, 4.0.0, 9.1.0
Last reconfirmed: 2024-04-03 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Benedek Thaler 2020-02-11 08:49:53 UTC
The following program hangs:

#include <iostream>
#include <sstream>

int main()
{
  std::stringstream str;
  str.put('a');       // 1
  str.put('\x80');    // 2
  str.put('a');       // 3

  str.ignore(32, '\x80'); // hangs
  std::cout << str.tellg() << "\n";
}

Removing any of the numbered lines (1,2,3) makes the problem go away.
Using std::basic_stringstream<unsigned char> does not exhibit the issue.
Passing unsigned delim (that does not get sign extended) also fixes it.
ASAN or UBSAN is silent. `-D_GLIBCXX_DEBUG` does not make a difference.
GCC 5.4, 6.3, 8.2, 9.2 are affected.
Comment 1 Jonathan Wakely 2024-04-03 17:18:32 UTC
Looks like I missed this one when it was filed.

The fix is:

--- a/libstdc++-v3/src/c++98/istream.cc
+++ b/libstdc++-v3/src/c++98/istream.cc
@@ -126,6 +126,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
              const int_type __eof = traits_type::eof();
              __streambuf_type* __sb = this->rdbuf();
              int_type __c = __sb->sgetc();
+             __delim = traits_type::to_int_type(__cdelim);
 
              bool __large_ignore = false;
              while (true)
Comment 2 Jonathan Wakely 2024-04-03 17:41:44 UTC
On second thoughts, I don't think that fix is right.

istream::ignore takes an int_type for the delimiter, so passing it a char_type with a negative value will confuse it. For example, str.ignore(n, '\xff`) will treat the delimiter as EOF and ignore up to n chars, ignoring any '\xff` characters that might be in the stream buffer's get area. That means it's wrong to call ignore(n, '\xff') on a platform where char is signed, because it won't do what you expect (unless you're really intending to treat \xff as EOF).

This case is similar, it ignores up to n characters, or until sgetc() returns a character equal to '\x80' ... but that can never happen because sgetc() never returns a negative value unless it reaches EOF.

So there is a gcc bug here, because we should not loop forever. But the problem is that we use inconsistent conditions for deciding whether we've found the delimiter.
Comment 3 Jonathan Wakely 2024-04-03 19:40:00 UTC
So maybe:

--- a/libstdc++-v3/src/c++98/istream.cc
+++ b/libstdc++-v3/src/c++98/istream.cc
@@ -112,7 +112,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     basic_istream<char>::
     ignore(streamsize __n, int_type __delim)
     {
-      if (traits_type::eq_int_type(__delim, traits_type::eof()))
+      // If __delim is eof() we ignore up to __n chars, and for any other
+      // negative value using eq_int_type(sgetc(), __delim) will never be true,
+      // so just treat all negative __delim values as eof().
+      if (__delim < 0)
        return ignore(__n);
 
       _M_gcount = 0;
Comment 4 Jonathan Wakely 2024-04-04 09:53:46 UTC
The incorrect behaviour was introduced with r0-59139-g80dddedcaf316e
Comment 5 GCC Commits 2024-04-15 18:29:41 UTC
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:2d694414ada8e3b58f504c1b175d31088529632e

commit r14-9978-g2d694414ada8e3b58f504c1b175d31088529632e
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Apr 4 10:33:33 2024 +0100

    libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
    
    A negative delim value passed to std::istream::ignore can never match
    any character in the stream, because the comparison is done using
    traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns
    negative values (except at EOF). The optimized version of ignore for the
    std::istream specialization uses traits_type::find to locate the delim
    character in the streambuf, which _can_ match a negative delim on
    platforms where char is signed, but then we do another comparison using
    eq_int_type which fails. The code then keeps looping forever, with
    traits_type::find locating the character and traits_type::eq_int_type
    saying it's not a match, so traits_type::find is used again and finds
    the same character again.
    
    A possible fix would be to check with eq_int_type after a successful
    find, to see whether we really have a match. However, that would be
    suboptimal since we know that a negative delimiter will never match
    using eq_int_type. So a better fix is to adjust the check at the top of
    the function that handles delim==eof(), so that we treat all negative
    delim values as equivalent to EOF. That way we don't bother using find
    to search for something that will never match with eq_int_type.
    
    The version of ignore in the primary template doesn't need a change,
    because it doesn't use traits_type::find, instead characters are
    extracted one-by-one and always matched using eq_int_type. That avoids
    the inconsistency between find and eq_int_type. The specialization for
    std::wistream does use traits_type::find, but traits_type::to_int_type
    is equivalent to an implicit conversion from wchar_t to wint_t, so
    passing a wchar_t directly to ignore without using to_int_type works.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/93672
            * src/c++98/istream.cc (istream::ignore(streamsize, int_type)):
            Treat all negative delimiter values as eof().
            * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test.
            * testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc: New
            test.
Comment 6 Jonathan Wakely 2024-04-15 18:32:36 UTC
Fixed on trunk so far.
Comment 7 GCC Commits 2024-05-02 14:16:20 UTC
The releases/gcc-13 branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:fcf60d0baafa1245f768ac375dc60a07e92e9673

commit r13-8675-gfcf60d0baafa1245f768ac375dc60a07e92e9673
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Apr 4 10:33:33 2024 +0100

    libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
    
    A negative delim value passed to std::istream::ignore can never match
    any character in the stream, because the comparison is done using
    traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns
    negative values (except at EOF). The optimized version of ignore for the
    std::istream specialization uses traits_type::find to locate the delim
    character in the streambuf, which _can_ match a negative delim on
    platforms where char is signed, but then we do another comparison using
    eq_int_type which fails. The code then keeps looping forever, with
    traits_type::find locating the character and traits_type::eq_int_type
    saying it's not a match, so traits_type::find is used again and finds
    the same character again.
    
    A possible fix would be to check with eq_int_type after a successful
    find, to see whether we really have a match. However, that would be
    suboptimal since we know that a negative delimiter will never match
    using eq_int_type. So a better fix is to adjust the check at the top of
    the function that handles delim==eof(), so that we treat all negative
    delim values as equivalent to EOF. That way we don't bother using find
    to search for something that will never match with eq_int_type.
    
    The version of ignore in the primary template doesn't need a change,
    because it doesn't use traits_type::find, instead characters are
    extracted one-by-one and always matched using eq_int_type. That avoids
    the inconsistency between find and eq_int_type. The specialization for
    std::wistream does use traits_type::find, but traits_type::to_int_type
    is equivalent to an implicit conversion from wchar_t to wint_t, so
    passing a wchar_t directly to ignore without using to_int_type works.
    
    libstdc++-v3/ChangeLog:
    
            PR libstdc++/93672
            * src/c++98/istream.cc (istream::ignore(streamsize, int_type)):
            Treat all negative delimiter values as eof().
            * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test.
            * testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc: New
            test.
    
    (cherry picked from commit 2d694414ada8e3b58f504c1b175d31088529632e)
Comment 8 Jonathan Wakely 2024-05-02 14:17:38 UTC
Fixed for 13.3 and 14.1 so far.