This is the mail archive of the
gcc-prs@gcc.gnu.org
mailing list for the GCC project.
Re: libstdc++/4150: catastrophic performance decrease in C++ code
- From: Jason Merrill <jason at redhat dot com>
- To: jason at gcc dot gnu dot org
- Cc: gcc-prs at gcc dot gnu dot org,
- Date: 18 Apr 2002 02:26:01 -0000
- Subject: Re: libstdc++/4150: catastrophic performance decrease in C++ code
- Reply-to: Jason Merrill <jason at redhat dot com>
The following reply was made to PR libstdc++/4150; it has been noted by GNATS.
From: Jason Merrill <jason@redhat.com>
To: libstdc++@gcc.gnu.org
Cc: gcc-gnats@gcc.gnu.org
Subject: Re: libstdc++/4150: catastrophic performance decrease in C++ code
Date: Thu, 18 Apr 2002 03:18:17 +0100
--=-=-=
>>>>> "Jason" == Jason Merrill <jason@redhat.com> writes:
> Which is what libio does, and which would seem to be the only way to avoid
> this sort of problem. Can anyone enlighten me as to the design
> goals/decisions of the current implementation?
The current implementation seems to be trying to allow interleaved reads
and writes on the same buffer, whereas the libio implementation requires a
flush when switching modes. Was this difference a deliberate design
decision? I'm a bit skeptical of it's usefulness; how often do people
really read a few characters, write a few characters, and then read a few
more characters without an intervening seek?
A problem with the current implementation of this is that if we do a read
on an input/output filebuf, we end up writing the contents of the buffer
back out to the file, even if we've never requested a write. Oops.
It would be possible to avoid this by changing how we manage the various
pointers into the buffer; when reading, if _M_out_beg == _M_out_cur we can
bump them together. If then we only write out the characters between them,
there won't be anything to write unless we actually requested a write.
I was thinking that we couldn't change this, because a lot of the
adjustment happens in non-virtual functions like sbumpc. But then of
course the semantics specified in the standard (i.e. sbumpc just advances
the read pointer) don't support proper filebuf semantics on the existing
scheme either; if the read pointer is bumped, and the write pointer is also
active, it needs to be bumped as well. In libio, this problem is avoided
by the mode switch; when it switches from read to write, the write pointer
is copied from the read pointer.
In v3, this problem is handled, basically, by contaminating streambuf with
information about filebuf semantics. We need the streambuf member
functions to adjust _M_out_cur with _M_in_cur, so we add a flag
(_M_buf_unified) that says so, and handle the logic in the _M_*_cur_move
functions. Similarly, for the benefit of stringbuf, we pretend that we can
just bump _M_out_end if we run up against it and we happen to know that
there's still room in the buffer.
It seems unfortunate to me that we need special hacks in streambuf to
support both of the standard derived streambufs. Both seem to be for
optimization; the filebuf hack to allow reading and writing on the same
buffer, and the stringbuf hack to avoid having to call overflow() through
the vtable for each character we want to add to the end of the string. Am
I right?
Anyway, here's a patch to clean up filebuf somewhat. It involves these
design changes:
1) The external file pointer always corresponds to _M_in_end.
Previously, it corresponded to _M_in_beg (thus the seeks).
2) _M_out_end always points to the end of the buffer. Previously, it
would point to the end of the valid contents, and we would rely on
_M_out_cur_move to push it forward if there was still room in the
buffer, as it still does for stringbufs. We don't need to do this
for filebufs, as _M_out_end doesn't have any special meaning like it
does for stringbufs.
3) _M_out_beg always points to the beginning of the write area. If we
haven't written anything to the buffer, _M_out_cur == _M_out_beg.
Previously the area between the two could consist entirely of
characters that were read.
4) We now override uflow() in filebuf, so we don't have to unget the
character read by underflow() in the sync_with_stdio case.
logauswerter.C times:
synced not synced
2.96 0:19 0:19
pre-patch 1:09 0:14
post-patch 0:26 0:13
Interesting that v3 is faster than v2 when not synced with stdio...
I feel like I know my way around streambufs a lot better now.
Tested i686-pc-linux-gnu, no regressions. Any objections?
Jason
2002-04-18 Jason Merrill <jason@redhat.com>
PR libstdc++/4150
* include/std/std_streambuf.h (basic_streambuf::_M_in_cur_move):
Also bump _M_out_beg if _M_buf_unified.
(basic_streambuf::_M_out_cur_move): Don't bump _M_in_cur past
_M_in_end.
(basic_streambuf::_M_set_indeterminate): Move to filebuf.
(basic_streambuf::_M_set_determinate): Likewise.
(basic_streambuf::_M_is_indeterminate): Likewise.
* include/bits/std_fstream.h (basic_filebuf::_M_really_underflow):
New non-static member function.
(basic_filebuf::_M_underflow, _M_uflow): Call it.
(basic_filebuf::sync): seekpos will handle all the positioning.
(basic_filebuf::_M_set_indeterminate): Move here from streambuf.
Always set _M_out_end to the end of the buffer.
(basic_filebuf::_M_set_determinate): Likewise.
(basic_filebuf::_M_is_indeterminate): Likewise.
* include/bits/fstream.tcc (basic_filebuf::close): Stuff to
write is from _M_out_beg to _M_out_cur.
(basic_filebuf::overflow): Likewise.
(basic_filebuf::_M_really_overflow): Likewise.
Seek back to _M_out_beg if necessary.
(basic_filebuf::seekoff): Likewise.
(basic_filebuf::_M_really_underflow): Generalization of old
underflow(). Don't seek back to _M_in_beg.
* testsuite/27_io/filebuf_virtuals.cc (test05): Don't overspecify
ungetc test.
--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline
*** ./include/bits/fstream.tcc.~1~ Fri Apr 12 11:28:30 2002
--- ./include/bits/fstream.tcc Thu Apr 18 02:16:12 2002
*************** namespace std
*** 183,189 ****
if (this->is_open())
{
const int_type __eof = traits_type::eof();
! bool __testput = _M_out_cur && _M_out_beg < _M_out_end;
if (__testput && _M_really_overflow(__eof) == __eof)
return __ret;
--- 183,189 ----
if (this->is_open())
{
const int_type __eof = traits_type::eof();
! bool __testput = _M_out_cur && _M_out_beg < _M_out_cur;
if (__testput && _M_really_overflow(__eof) == __eof)
return __ret;
*************** namespace std
*** 242,248 ****
template<typename _CharT, typename _Traits>
typename basic_filebuf<_CharT, _Traits>::int_type
basic_filebuf<_CharT, _Traits>::
! underflow()
{
int_type __ret = traits_type::eof();
bool __testin = _M_mode & ios_base::in;
--- 242,248 ----
template<typename _CharT, typename _Traits>
typename basic_filebuf<_CharT, _Traits>::int_type
basic_filebuf<_CharT, _Traits>::
! _M_really_underflow(bool __bump)
{
int_type __ret = traits_type::eof();
bool __testin = _M_mode & ios_base::in;
*************** namespace std
*** 261,283 ****
}
// Sync internal and external buffers.
! // NB: __testget -> __testput as _M_buf_unified here.
! bool __testget = _M_in_cur && _M_in_beg < _M_in_cur;
! bool __testinit = _M_is_indeterminate();
if (__testget)
{
! if (__testout)
! _M_really_overflow();
! #if _GLIBCPP_AVOID_FSEEK
! else if ((_M_in_cur - _M_in_beg) == 1)
! _M_file->sys_getc();
! #endif
! else
! _M_file->seekoff(_M_in_cur - _M_in_beg,
! ios_base::cur, ios_base::in);
}
!
! if (__testinit || __testget)
{
const locale __loc = this->getloc();
const __codecvt_type& __cvt = use_facet<__codecvt_type>(__loc);
--- 261,278 ----
}
// Sync internal and external buffers.
! bool __testput = _M_out_cur && _M_out_beg < _M_out_cur;
! if (__testput && __testout)
! _M_really_overflow();
!
! bool __testget = _M_in_cur && _M_in_cur < _M_in_end;
if (__testget)
{
! __ret = traits_type::to_int_type(*_M_in_cur);
! if (__bump)
! _M_in_cur_move(1);
}
! else
{
const locale __loc = this->getloc();
const __codecvt_type& __cvt = use_facet<__codecvt_type>(__loc);
*************** namespace std
*** 313,332 ****
if (0 < __ilen)
{
_M_set_determinate(__ilen);
- if (__testout)
- _M_out_cur = _M_in_cur;
__ret = traits_type::to_int_type(*_M_in_cur);
! #if _GLIBCPP_AVOID_FSEEK
! if (__elen == 1)
! _M_file->sys_ungetc(*_M_in_cur);
! else
{
! #endif
! _M_file->seekoff(-__elen, ios_base::cur, ios_base::in);
! #if _GLIBCPP_AVOID_FSEEK
}
! #endif
! }
}
}
_M_last_overflowed = false;
--- 308,325 ----
if (0 < __ilen)
{
_M_set_determinate(__ilen);
__ret = traits_type::to_int_type(*_M_in_cur);
! if (__bump)
! _M_in_cur_move(1);
! else if (_M_buf_size == 1)
{
! /* If we are synced with stdio, we have to unget the
! character we just read so that the file pointer
! stays the same. */
! _M_file->sys_ungetc(*_M_in_cur);
! _M_set_indeterminate();
}
! }
}
}
_M_last_overflowed = false;
*************** namespace std
*** 407,413 ****
overflow(int_type __c)
{
int_type __ret = traits_type::eof();
! bool __testput = _M_out_cur && _M_out_cur < _M_buf + _M_buf_size;
bool __testout = _M_mode & ios_base::out;
if (__testout)
--- 400,406 ----
overflow(int_type __c)
{
int_type __ret = traits_type::eof();
! bool __testput = _M_out_cur && _M_out_cur < _M_out_end;
bool __testout = _M_mode & ios_base::out;
if (__testout)
*************** namespace std
*** 418,424 ****
_M_out_cur_move(1);
__ret = traits_type::not_eof(__c);
}
! else
__ret = this->_M_really_overflow(__c);
}
--- 411,417 ----
_M_out_cur_move(1);
__ret = traits_type::not_eof(__c);
}
! else
__ret = this->_M_really_overflow(__c);
}
*************** namespace std
*** 491,497 ****
_M_really_overflow(int_type __c)
{
int_type __ret = traits_type::eof();
! bool __testput = _M_out_cur && _M_out_beg < _M_out_end;
bool __testunbuffered = _M_file && !_M_buf_size;
if (__testput || __testunbuffered)
--- 484,490 ----
_M_really_overflow(int_type __c)
{
int_type __ret = traits_type::eof();
! bool __testput = _M_out_cur && _M_out_beg < _M_out_cur;
bool __testunbuffered = _M_file && !_M_buf_size;
if (__testput || __testunbuffered)
*************** namespace std
*** 500,509 ****
streamsize __elen = 0;
streamsize __plen = 0;
// Convert internal buffer to external representation, output.
// NB: In the unbuffered case, no internal buffer exists.
if (!__testunbuffered)
! _M_convert_to_external(_M_out_beg, _M_out_end - _M_out_beg,
__elen, __plen);
// Convert pending sequence to external representation, output.
--- 493,511 ----
streamsize __elen = 0;
streamsize __plen = 0;
+ // Need to restore current position. The position of the external
+ // byte sequence (_M_file) corresponds to _M_in_end, and we need
+ // to move it to _M_out_beg for the write.
+ if (_M_in_end && _M_in_end != _M_out_beg)
+ {
+ off_type __off = _M_out_beg - _M_in_end;
+ _M_file->seekoff(__off, ios_base::cur);
+ }
+
// Convert internal buffer to external representation, output.
// NB: In the unbuffered case, no internal buffer exists.
if (!__testunbuffered)
! _M_convert_to_external(_M_out_beg, _M_out_cur - _M_out_beg,
__elen, __plen);
// Convert pending sequence to external representation, output.
*************** namespace std
*** 547,553 ****
_M_buf_size_opt = _M_buf_size = __n;
_M_set_indeterminate();
! // Step 3: Make sure a pback buffer is allocated.
_M_allocate_pback_buffer();
}
_M_last_overflowed = false;
--- 549,555 ----
_M_buf_size_opt = _M_buf_size = __n;
_M_set_indeterminate();
! // Step 3: Make sure a pback buffer is allocated.
_M_allocate_pback_buffer();
}
_M_last_overflowed = false;
*************** namespace std
*** 579,585 ****
off_type __computed_off = __width * __off;
bool __testget = _M_in_cur && _M_in_beg < _M_in_end;
! bool __testput = _M_out_cur && _M_out_beg < _M_out_end;
// Sync the internal and external streams.
// out
if (__testput || _M_last_overflowed)
--- 581,587 ----
off_type __computed_off = __width * __off;
bool __testget = _M_in_cur && _M_in_beg < _M_in_end;
! bool __testput = _M_out_cur && _M_out_beg < _M_out_cur;
// Sync the internal and external streams.
// out
if (__testput || _M_last_overflowed)
*************** namespace std
*** 592,598 ****
//in
// NB: underflow() rewinds the external buffer.
else if (__testget && __way == ios_base::cur)
! __computed_off += _M_in_cur - _M_in_beg;
__ret = _M_file->seekoff(__computed_off, __way, __mode);
_M_set_indeterminate();
--- 594,600 ----
//in
// NB: underflow() rewinds the external buffer.
else if (__testget && __way == ios_base::cur)
! __computed_off -= _M_in_end - _M_in_cur;
__ret = _M_file->seekoff(__computed_off, __way, __mode);
_M_set_indeterminate();
*** ./include/std/std_fstream.h.~1~ Thu Apr 11 13:19:05 2002
--- ./include/std/std_fstream.h Thu Apr 18 01:15:27 2002
*************** namespace std
*** 141,148 ****
// underflow() and uflow() functions are called to get the next
// charater from the real input source when the buffer is empty.
// Buffered input uses underflow()
virtual int_type
! underflow();
virtual int_type
pbackfail(int_type __c = _Traits::eof());
--- 141,161 ----
// underflow() and uflow() functions are called to get the next
// charater from the real input source when the buffer is empty.
// Buffered input uses underflow()
+
+ // The only difference between underflow() and uflow() is that the
+ // latter bumps _M_in_cur after the read. In the sync_with_stdio
+ // case, this is important, as we need to unget the read character in
+ // the underflow() case in order to maintain synchronization. So
+ // instead of calling underflow() from uflow(), we create a common
+ // subroutine to do the real work.
+ int_type
+ _M_really_underflow(bool __bump);
+
virtual int_type
! underflow() { return _M_really_underflow(false); }
!
! virtual int_type
! uflow() { return _M_really_underflow(true); }
virtual int_type
pbackfail(int_type __c = _Traits::eof());
*************** namespace std
*** 187,207 ****
virtual int
sync()
{
! bool __testput = _M_out_cur && _M_out_beg < _M_out_end;
// Make sure that the internal buffer resyncs its idea of
// the file position with the external file.
if (__testput && !_M_file->sync())
! {
! // Need to restore current position. This interpreted as
! // the position of the external byte sequence (_M_file)
! // plus the offset in the current internal buffer
! // (_M_out_beg - _M_out_cur)
! streamoff __cur = _M_file->seekoff(0, ios_base::cur);
! off_type __off = _M_out_cur - _M_out_beg;
! _M_really_overflow();
! _M_file->seekpos(__cur + __off);
! }
_M_last_overflowed = false;
return 0;
}
--- 200,212 ----
virtual int
sync()
{
! bool __testput = _M_out_cur && _M_out_beg < _M_out_cur;
// Make sure that the internal buffer resyncs its idea of
// the file position with the external file.
if (__testput && !_M_file->sync())
! _M_really_overflow();
!
_M_last_overflowed = false;
return 0;
}
*************** namespace std
*** 239,244 ****
--- 244,292 ----
void
_M_output_unshift();
+
+ // These three functions are used to clarify internal buffer
+ // maintenance. After an overflow, or after a seekoff call that
+ // started at beg or end, or possibly when the stream becomes
+ // unbuffered, and a myriad other obscure corner cases, the
+ // internal buffer does not truly reflect the contents of the
+ // external buffer. At this point, for whatever reason, it is in
+ // an indeterminate state.
+ void
+ _M_set_indeterminate(void)
+ {
+ if (_M_mode & ios_base::in)
+ this->setg(_M_buf, _M_buf, _M_buf);
+ if (_M_mode & ios_base::out)
+ this->setp(_M_buf, _M_buf + _M_buf_size);
+ }
+
+ void
+ _M_set_determinate(off_type __off)
+ {
+ bool __testin = _M_mode & ios_base::in;
+ bool __testout = _M_mode & ios_base::out;
+ if (__testin)
+ this->setg(_M_buf, _M_buf, _M_buf + __off);
+ if (__testout)
+ this->setp(_M_buf, _M_buf + _M_buf_size);
+ }
+
+ bool
+ _M_is_indeterminate(void)
+ {
+ bool __ret = false;
+ // Don't return true if unbuffered.
+ if (_M_buf)
+ {
+ __ret = true;
+ if (_M_mode & ios_base::in)
+ __ret &= _M_in_beg == _M_in_cur && _M_in_cur == _M_in_end;
+ if (_M_mode & ios_base::out)
+ __ret &= _M_out_beg == _M_out_cur;
+ }
+ return __ret;
+ }
};
*** ./include/std/std_streambuf.h.~1~ Mon Feb 18 00:38:23 2002
--- ./include/std/std_streambuf.h Thu Apr 18 01:13:03 2002
*************** namespace std
*** 184,190 ****
bool __testout = _M_out_cur;
_M_in_cur += __n;
if (__testout && _M_buf_unified)
! _M_out_cur += __n;
}
// Correctly sets the _M_out_cur pointer, and bumps the
--- 184,194 ----
bool __testout = _M_out_cur;
_M_in_cur += __n;
if (__testout && _M_buf_unified)
! {
! if (_M_out_cur == _M_out_beg)
! _M_out_beg += __n;
! _M_out_cur += __n;
! }
}
// Correctly sets the _M_out_cur pointer, and bumps the
*************** namespace std
*** 202,208 ****
_M_out_cur += __n;
if (__testin && _M_buf_unified)
! _M_in_cur += __n;
if (_M_out_cur > _M_out_end)
{
_M_out_end = _M_out_cur;
--- 206,216 ----
_M_out_cur += __n;
if (__testin && _M_buf_unified)
! {
! _M_in_cur += __n;
! if (_M_in_cur > _M_in_end)
! _M_in_cur = _M_in_end;
! }
if (_M_out_cur > _M_out_end)
{
_M_out_end = _M_out_cur;
*************** namespace std
*** 230,277 ****
}
return __ret;
}
-
- // These three functions are used to clarify internal buffer
- // maintenance. After an overflow, or after a seekoff call that
- // started at beg or end, or possibly when the stream becomes
- // unbuffered, and a myrid other obscure corner cases, the
- // internal buffer does not truly reflect the contents of the
- // external buffer. At this point, for whatever reason, it is in
- // an indeterminate state.
- void
- _M_set_indeterminate(void)
- {
- if (_M_mode & ios_base::in)
- this->setg(_M_buf, _M_buf, _M_buf);
- if (_M_mode & ios_base::out)
- this->setp(_M_buf, _M_buf);
- }
-
- void
- _M_set_determinate(off_type __off)
- {
- bool __testin = _M_mode & ios_base::in;
- bool __testout = _M_mode & ios_base::out;
- if (__testin)
- this->setg(_M_buf, _M_buf, _M_buf + __off);
- if (__testout)
- this->setp(_M_buf, _M_buf + __off);
- }
-
- bool
- _M_is_indeterminate(void)
- {
- bool __ret = false;
- // Don't return true if unbuffered.
- if (_M_buf)
- {
- if (_M_mode & ios_base::in)
- __ret = _M_in_beg == _M_in_cur && _M_in_cur == _M_in_end;
- if (_M_mode & ios_base::out)
- __ret = _M_out_beg == _M_out_cur && _M_out_cur == _M_out_end;
- }
- return __ret;
- }
public:
virtual
--- 238,243 ----
*** ./src/ios.cc.~1~ Wed Apr 3 21:19:20 2002
--- ./src/ios.cc Thu Apr 18 02:50:33 2002
*************** namespace std
*** 150,163 ****
int __out_bufsize = __sync ? 0 : static_cast<int>(BUFSIZ);
int __in_bufsize = __sync ? 1 : static_cast<int>(BUFSIZ);
- #if _GLIBCPP_AVOID_FSEEK
- // Platforms that prefer to avoid fseek() calls on streams only
- // get their desire when the C++-layer input buffer size is 1.
- // This hack hurts performance but keeps correctness across
- // all types of streams that might be attached to (e.g.) cin.
- __in_bufsize = 1;
- #endif
-
// NB: The file globals.cc creates the four standard files
// with NULL buffers. At this point, we swap out the dummy NULL
// [io]stream objects and buffers with the real deal.
--- 150,155 ----
*** ./testsuite/27_io/filebuf_virtuals.cc.~1~ Mon Jan 28 16:51:59 2002
--- ./testsuite/27_io/filebuf_virtuals.cc Thu Apr 18 02:29:12 2002
*************** void test05()
*** 443,449 ****
--- 443,451 ----
c3 = fb_03.sputc('\n');
strmsz_1 = fb_03.sputn("because because because. . .", 28);
VERIFY( strmsz_1 == 28 );
+ // Defect? retval of sungetc is not necessarily the character
c1 = fb_03.sungetc();
+ c1 = fb_03.sgetc();
fb_03.pubsync();
c3 = fb_03.sgetc();
VERIFY( c1 == c3 );
--=-=-=--