This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: istringstream 'read' terminating on a null character


[libstdc++-v3 list added]


On Thu, Jul 05, 2001 at 12:01:06AM -0400, Phil Edwards wrote:
> On Thu, Jul 05, 2001 at 02:37:46PM +1200, Mathew Pitchforth wrote:
> > When I use GCC V3.0 I am finding that binary 'read' operations on an
> > istringstream are terminating on a null character.  The write operations are
> > unaffected by the null characters and setting the streams to binary does not
> > resolved the problem.
> 
> The stringstream classes ignore the binary flag (which does not do what
> you seem to think it does).  I'll try and take a closer look at your code
> tomorrow and test it with current sources.

Hmmm.  For all the good things which using a NUL character bought us in C,
we pay for it in C++...

I'll note as an aside that string and istringstream are typedefs for
basic_string<char> and basic_istringstream<char>, and using null characters
as real data in types specialized for char is walking the line between
"clever reuse" and "abuse".  This kind of thing should work, but sort of
goes against intentions.  Anyhow.

The bug, if there is one, is in the string implementation, not istringstream.
(There is one place where this might could be fixed; I'll come back to
that later.)

I tried stepping through the code in gdb, and gdb kept crashing.  So using
the ancient method of adding fprintf() calls to your code and to the
installed headers, I got slightly modified output.  The new code and new
results are here:

   Script started on Thu Jul  5 15:35:58 2001
   1% cat read_nulls.cc 
   
   #include <iostream>
   #include <sstream>
   #include <string>
   #include <cstdio>
   
   using namespace std;
   
   unsigned long int num = 0x12340078;
   
   int main()
   {
     string s;
     ostringstream os;//( ios_base::binary );
     os.write(reinterpret_cast<const char*>(&num), sizeof(num));
     s = os.str();
   
     printf("S length = %d\n", s.size());
     printf(" byte[0] = %x\n", (unsigned char) s[0] );
     printf(" byte[1] = %x\n", (unsigned char) s[1] );
     printf(" byte[2] = %x\n", (unsigned char) s[2] );
     printf(" byte[3] = %x\n", (unsigned char) s[3] );
   
     unsigned long int dest = 0;
   
     printf("\nreading now\nS length = %d\n", s.size());
     istringstream is( s );//, ios_base::binary );
     printf("good = %d\n", is.good() );
     is.read( reinterpret_cast<char*>(&dest),sizeof(dest));
     printf("gcount = %d\n", is.gcount());
     printf("dest = %lx\n", dest);
     printf("good = %d, state = %d\n", is.good(), is.rdstate() );
   
     return 0;
   }
   
   2% ./a.out 
   *** _M_buf_size 2 = 0
   S length = 4
    byte[0] = 78
    byte[1] = 0
    byte[2] = 34
    byte[3] = 12
   
+  reading now
+  S length = 4
+  *** _M_buf_size 2 = 1
+  good = 1
+  *** _M_in_cur = 8053a24
+  *** _M_in_end = 8053a25
+  *** _M_in_cur = 8053a25
+  *** _M_in_end = 8053a25
+  gcount = 1
+  dest = 78
+  good = 0, state = 6
   3% exit
   Script done on Thu Jul  5 15:36:17 2001

The lines with '+'s are the important ones.  The lines prefixed with ***
were added to the library code.

1)  s.size() is 4 when called from your code.  In the construction of the
stringbuf, we execute

        _M_buf_size = _M_string.size();

in _M_stringbuf_init().  Then I print the result.  It's 1 instead of 4,
presumably due to the NUL byte after the 0x78.

2)  The rest of the istringstream's streambuf behaves correctly, under the
belief that the buffer is a single character long.  Later, when sbump()
goes to fetch characters:

    basic_streambuf<_CharT, _Traits>::
    sbumpc()
    {
      int_type __ret;
      if (_M_in_cur && _M_in_cur < _M_in_end)
        {
          char_type __c = *gptr();
          _M_in_cur_move(1);
          __ret = traits_type::to_int_type(__c);
        }
      else
        __ret = this->uflow();
      return __ret;
    }

First time we execute the 'if' body and get the 0x78.  In the augmented
output above you can see that _M_in_cur is one character short of _M_in_end.
(Annoyingly, gdb could not see these member variables at all no matter
what I tried.  Hence the fallback on fprintf.)

The next time through the loop, the test fails since _M_in_cur has been
incremented, we call uflow(), and gdb coredumps.  Or, running it outside
the debugger, we call uflow() which calls underflow() which for stringbufs is

      // Overridden virtual functions:
      virtual int_type
      underflow()
      {
        if (_M_in_cur && _M_in_cur < _M_in_end)
          return traits_type::to_int_type(*gptr());
        else
          return traits_type::eof();
      }

The same pointer comparison test fails, and we return EOF all the way back
to the topmost call to is.read(), which sets the eofbit and the failbit,
which can be seen in the "state = 6" part I added at the end.


All that was to convince myself, and anyone else reading this, that the
streambuf/stringbuf is doing the right thing, but GIGO.  The only thing
which might be changed here is the "_M_buf_size = _M_string.size();"
assignment.  Maybe we can use some method other than size().  I'm at a
loss to suggest what.

I had a brief peek inside the basic_string code to see what size() was doing,
and why it was returning 4 at one point but 1 at another.  I immediately
got lost in the forest of basic_string::_Rep code:  the water and food were
running low, my guide had been eaten by a grue, and the banjo theme from
"Deliverance" was playing.  So I stopped.  Help, anyone?


Phil

-- 
Would I had phrases that are not known, utterances that are strange, in
new language that has not been used, free from repetition, not an utterance
which has grown stale, which men of old have spoken.
                                     - anonymous Egyptian scribe, c.1700 BC


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]