Bug 15002

Summary: [3.4 only] Linewise stream input is unusably slow (std::string slow)
Product: gcc Reporter: aaron
Component: libstdc++Assignee: Paolo Carlini <paolo.carlini>
Status: RESOLVED FIXED    
Severity: enhancement CC: gcc-bugs
Priority: P2    
Version: 3.3.3   
Target Milestone: 3.4.1   
Host: i686-debian-linux Target: i686-debian-linux
Build: i686-debian-linux Known to work: 4.0.0
Known to fail: 3.4.0 Last reconfirmed: 2004-04-18 14:20:12
Attachments: The test case
The gprof output for the streams program

Description aaron 2004-04-18 14:06:29 UTC
I noticed that std::getline is unusably slow. So I wrote three test programs, 
one using fgets(), one using std::getline() and one using 
std::ifstream::getline(). I have read the same file (200000 lines, 39MB) 
in all cases; here are the results:

Using fgets():

aisotton@zarathustra:~/test/iotest$ time ./stdio

real    0m0.222s
user    0m0.109s
sys     0m0.073s

------------

Using std::getline():

aisotton@zarathustra:~/test/iotest$ time ./streams

real    0m6.945s
user    0m5.316s
sys     0m0.074s

------------

Using std::ifstream::getline():

aisotton@zarathustra:~/test/iotest$ time ./streams2

real    0m1.002s
user    0m0.890s
sys     0m0.074s

------------

fgets() is more than 30 times faster than std::getline(), and more than 4 times
faster than std::ifstream::getline(). This is not acceptable.

- It is not a buffering problem; using strace I found out that fgets() uses a
4096 byte buffer, and the C++ streams a 8192 byte buffer.

- It has nothing to do with optimization. The values are roughly the same for
all optimization levels.

- g++ 2.95 performs better than 3.2, and 3.2 better than 3.3, but they are still
slow. Here the exact versions:

aisotton@zarathustra:~/test/iotest$ g++-2.95 -v
Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.4/specs
gcc version 2.95.4 20011002 (Debian prerelease)

aisotton@zarathustra:~/test/iotest$ g++-3.2 -v
Reading specs from /usr/lib/gcc-lib/i386-linux/3.2.3/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,f77,objc,ada --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.2 --enable-shared 
--with-system-zlib --enable-nls --without-included-gettext 
--enable-__cxa_atexit --enable-clocale=gnu --enable-objc-gc i386-linux
Thread model: posix
gcc version 3.2.3 (Debian)

aisotton@zarathustra:~/test/iotest$ g++-3.3 -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.3/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--with-system-zlib --enable-nls --without-included-gettext 
--enable-__cxa_atexit --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux
Thread model: posix
gcc version 3.3.3 (Debian 20040401)

And here the times for the second test:
-- 2.95 --
real    0m4.345s
user    0m4.058s
sys     0m0.067s

-- 3.2 --
real    0m5.728s
user    0m5.311s
sys     0m0.082s

-- 3.3 --
real    0m6.586s
user    0m5.298s
sys     0m0.091s

- It has nothing to do with stdio synchronisation, since we're reading from
files. To be really sure, I tried using

std::ios_base::sync_with_stdio(false);

(yes, before the first call!) but that didn't make any difference.

- Increasing the buffer size of the stream didn't make any real difference either.

Almost two years ago there was a long discussion about this on

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=gidnla.uh5.ln%40ns1.irule.be&rnum=1&prev=/groups%3Fq%3Dg%252B%252B%2Bslow%2Bstd::getline%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26selm%3Dgidnla.uh5.ln%2540ns1.irule.be%26rnum%3D1

but they didn't really come to a solution.

I did some profiling; here the most interesting result (of the "streams" test):

		     Call graph (explanation follows)


granularity: each sample hit covers 4 byte(s) for 0.02% of 52.21 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     19.5   10.20    0.00                
std::string::replace(__gnu_cxx::__normal_iterator<char*, std::string>,
__gnu_cxx::__normal_iterator<char*, std::string>, unsigned int, char) [1]
-----------------------------------------------
                                                 <spontaneous>
[2]     17.6    9.17    0.00                 memset [2]
-----------------------------------------------
                                                 <spontaneous>
[3]     14.5    7.56    0.00                 std::string::_M_mutate(unsigned
int, unsigned int, unsigned int) [3]
-----------------------------------------------
                                                 <spontaneous>
[4]     12.0    6.29    0.00                 std::string::append(unsigned int,
char) [4]
-----------------------------------------------
                                                 <spontaneous>
[5]     10.7    5.59    0.00                 std::basic_istream<char,
std::char_traits<char> >& std::getline<char, std::char_traits<char>,
std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&,
std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, char) [5]
-----------------------------------------------
                                                 <spontaneous>
[6]     10.4    5.41    0.00                 std::string::_M_iend() const [6]
-----------------------------------------------
                                                 <spontaneous>
[7]      8.5    4.42    0.00                 std::basic_streambuf<char,
std::char_traits<char> >::sbumpc() [7]
-----------------------------------------------
                                                 <spontaneous>
[8]      4.6    2.40    0.00                 std::string::_M_ibegin() const [8]
-----------------------------------------------
                                                 <spontaneous>
[9]      1.5    0.79    0.00                 read [9]

Much time seems to be lost in std::string::replace, for which I have no
explanatin whatsoever.

I've uploaded a very small "test suite" demonstrating the strange behaviour to
http://www.isotton.com/sandbox/iotest.tar.gz. Notice that it will create a file 
of about 40MB.
Comment 1 aaron 2004-04-18 14:08:13 UTC
Created attachment 6111 [details]
The test case

Unpack it and run 'make test'. Notice that it'll create a temporary file of
about 40MB!
Comment 2 aaron 2004-04-18 14:09:15 UTC
Created attachment 6112 [details]
The gprof output for the streams program

The complete output of 'gprof -q' on the 'streams' test program.
Comment 3 Andrew Pinski 2004-04-18 14:20:11 UTC
Confirmed, the issue might be using wrong std::string functions to do std::getline.  Otherwise std::
string is where it is slow.
Comment 4 aaron 2004-04-18 14:39:04 UTC
That's not true. std::getline() is HORRIBLY slow (more than 30 times slower than
fgets()), basically making it unusable to read large files.
std::ifstream::getline() is still VERY slow (more than 4 times slower than
fgets()), which still makes it nearly unusable. This might not be a 'critical'
bug, but it has at least severity 'normal' because almost every serious C++
program uses linewise input somewhere. Anyway, I leave it to you to change that
if you feel like :-)
Comment 5 Andrew Pinski 2004-04-18 14:46:59 UTC
Well for the mainline it is already faster:
tin:~/src/gnu/gcctest/iotest/iotest>time ./streams
3.510u 0.040s 0:03.69 96.2%     0+0k 0+0io 202pf+0w
tin:~/src/gnu/gcctest/iotest/iotest>time ./streams2
0.420u 0.040s 0:00.69 66.6%     0+0k 0+0io 200pf+0w
tin:~/src/gnu/gcctest/iotest/iotest>time ./stdio
0.090u 0.060s 0:00.14 107.1%    0+0k 0+0io 80pf+0w

But since this has never been fast, it is an enhancement.
Comment 6 GCC Commits 2004-04-19 11:30:29 UTC
Subject: Bug 15002

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	paolo@gcc.gnu.org	2004-04-19 11:30:21

Modified files:
	libstdc++-v3   : ChangeLog 
	libstdc++-v3/include/bits: basic_string.h 

Log message:
	2004-04-19  Paolo Carlini  <pcarlini@suse.de>
	
	PR libstdc++/15002 (partial)
	* include/bits/basic_string.h (_M_replace_aux, _M_replace_safe):
	Special case __n2 == 1, not calling traits_type::assign/copy.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&r1=1.2452&r2=1.2453
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/include/bits/basic_string.h.diff?cvsroot=gcc&r1=1.58&r2=1.59

Comment 7 Andrew Pinski 2004-04-20 16:44:06 UTC
Paolo, your patch helped a lot (I looked a profile generated by Shark).
I noticed that most of the time now is spent in std::string::_M_mutate from std::string::_M_replace_aux, 
not doing much at all.
Some suggestions after looking into the source/asm:
The calculation for __src could be moved inside the if statements, removing a load and two adds.

On the same box as before after Paolo's patch, so a little more than a 2x speedup:
tin:~/src/gnu/gcctest/iotest/iotest>time ./streams
1.340u 0.100s 0:01.43 100.6%    0+0k 0+0io 202pf+0w
tin:~/src/gnu/gcctest/iotest/iotest>time ./streams2
0.430u 0.010s 0:00.44 100.0%    0+0k 0+0io 199pf+0w
tin:~/src/gnu/gcctest/iotest/iotest>time ./stdio
0.080u 0.060s 0:00.15 93.3%     0+0k 0+0io 80pf+0w
Comment 8 Paolo Carlini 2004-04-20 16:59:08 UTC
Thanks Andrew for testing again and in particular for your suggestion!
Comment 9 GCC Commits 2004-04-23 14:44:45 UTC
Subject: Bug 15002

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	hammer-3_3-branch
Changes by:	paolo@gcc.gnu.org	2004-04-23 14:44:40

Modified files:
	libstdc++-v3   : ChangeLog.hammer 
	libstdc++-v3/include/bits: basic_string.tcc 

Log message:
	2004-04-23  Paolo Carlini  <pcarlini@suse.de>
	
	PR libstdc++/15002 (partial)
	* include/bits/basic_string.tcc (_M_replace_safe, replace(iterator,
	iterator, size_type, _CharT)): Special case __dnew/__n2 == 1, not
	calling _S_copy_chars/traits_type::assign.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.hammer.diff?cvsroot=gcc&only_with_tag=hammer-3_3-branch&r1=1.1.2.10&r2=1.1.2.11
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/include/bits/basic_string.tcc.diff?cvsroot=gcc&only_with_tag=hammer-3_3-branch&r1=1.28.2.6&r2=1.28.2.7

Comment 10 GCC Commits 2004-04-24 09:09:57 UTC
Subject: Bug 15002

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_4-branch
Changes by:	paolo@gcc.gnu.org	2004-04-24 09:09:53

Modified files:
	libstdc++-v3   : ChangeLog 
	libstdc++-v3/include/bits: basic_string.h basic_string.tcc 

Log message:
	2004-04-24  Paolo Carlini  <pcarlini@suse.de>
	Andrew Pinski  <pinskia@physics.uc.edu>
	
	* include/bits/basic_string.tcc (_M_mutate): Don't compute
	__src unnecessarily.
	
	2004-04-24  Paolo Carlini  <pcarlini@suse.de>
	
	PR libstdc++/15002 (partial)
	* include/bits/basic_string.h (_M_replace_aux, _M_replace_safe):
	Special case __n2 == 1, not calling traits_type::assign/copy.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.2224.2.82&r2=1.2224.2.83
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/include/bits/basic_string.h.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.41.4.3&r2=1.41.4.4
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/include/bits/basic_string.tcc.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.46.4.5&r2=1.46.4.6

Comment 11 GCC Commits 2004-04-24 22:20:37 UTC
Subject: Bug 15002

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	paolo@gcc.gnu.org	2004-04-24 22:20:31

Modified files:
	libstdc++-v3   : ChangeLog 
	libstdc++-v3/include/bits: istream.tcc 
Added files:
	libstdc++-v3/testsuite/27_io/basic_istream/getline/char: 4.cc 

Log message:
	2004-04-24  Paolo Carlini  <pcarlini@suse.de>
	Petur Runolfsson  <peturr02@ru.is>
	
	PR libstdc++/15002 (continued)
	* include/bits/istream.tcc (basic_istream<>::getline(char_type*,
	streamsize, char_type)): Use traits::find/copy in a loop to speed
	up greatly the function in the common case (I/O buffer size >> 1).
	
	2004-04-24  Paolo Carlini  <pcarlini@suse.de>
	
	* testsuite/27_io/basic_istream/getline/char/4.cc: New.
	
	* include/bits/istream.tcc (getline(basic_istream<>&,
	basic_string<>&, _CharT)): Change to use sgetc()/snextc() instead
	of sbumpc(), consistently with the other functions, thus also
	dealing correctly with the case of exceeded string::max_size().

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&r1=1.2459&r2=1.2460
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/include/bits/istream.tcc.diff?cvsroot=gcc&r1=1.61&r2=1.62
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/testsuite/27_io/basic_istream/getline/char/4.cc.diff?cvsroot=gcc&r1=NONE&r2=1.1

Comment 12 GCC Commits 2004-04-25 15:45:19 UTC
Subject: Bug 15002

CVSROOT:	/cvs/gcc
Module name:	gcc
Changes by:	paolo@gcc.gnu.org	2004-04-25 15:45:14

Modified files:
	libstdc++-v3   : ChangeLog 
	libstdc++-v3/include/bits: istream.tcc basic_string.tcc 
	libstdc++-v3/testsuite/27_io/basic_istream/getline/char: 4.cc 

Log message:
	2004-04-25  Paolo Carlini  <pcarlini@suse.de>
	
	PR libstdc++/15002 (continued again)
	* include/bits/istream.tcc (getline(basic_istream<>&,
	basic_string<>&, _CharT)): Use a temporary buffer, thus
	avoiding reallocation for common case.
	
	* include/bits/basic_string.tcc (_S_construct(_InIterator,
	_InIterator, const _Alloc&, input_iterator_tag)): Tweak size
	of temporary buffer to a power of two.
	
	* testsuite/27_io/basic_istream/getline/char/4.cc: Add comment.

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&r1=1.2461&r2=1.2462
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/include/bits/istream.tcc.diff?cvsroot=gcc&r1=1.62&r2=1.63
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/include/bits/basic_string.tcc.diff?cvsroot=gcc&r1=1.69&r2=1.70
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/testsuite/27_io/basic_istream/getline/char/4.cc.diff?cvsroot=gcc&r1=1.1&r2=1.2

Comment 13 Andrew Pinski 2004-04-26 15:42:53 UTC
This is now fixed on the mainline:
tin:~/src/gnu/gcctest/iotest/iotest>time ./streams2
0.100u 0.040s 0:00.14 100.0%    0+0k 0+0io 205pf+0w
tin:~/src/gnu/gcctest/iotest/iotest>time ./stdio
0.080u 0.050s 0:00.14 92.8%     0+0k 0+0io 85pf+0w
tin:~/src/gnu/gcctest/iotest/iotest>time ./streams
0.420u 0.070s 0:00.49 100.0%    0+0k 0+0io 208pf+0w
Which makes using streams to very close to running stdio.

Paolo says he is going to try to get this into 3.4.1 also.
Comment 14 GCC Commits 2004-05-02 16:10:37 UTC
Subject: Bug 15002

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	hammer-3_3-branch
Changes by:	paolo@gcc.gnu.org	2004-05-02 16:10:34

Modified files:
	libstdc++-v3   : ChangeLog.hammer 
	libstdc++-v3/include/bits: istream.tcc 

Log message:
	2004-05-02  Paolo Carlini  <pcarlini@suse.de>
	
	PR libstdc++/15002 (continued)
	* include/bits/istream.tcc (getline(basic_istream<>&,
	basic_string<>&, _CharT)): Use a temporary buffer, thus
	avoiding reallocation for common case.
	
	* include/bits/istream.tcc (getline(basic_istream<>&,
	basic_string<>&, _CharT)): Change to use sgetc()/snextc() instead
	of sbumpc(), consistently with the other functions, thus also
	dealing correctly with the case of exceeded string::max_size().

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.hammer.diff?cvsroot=gcc&only_with_tag=hammer-3_3-branch&r1=1.1.2.12&r2=1.1.2.13
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/include/bits/istream.tcc.diff?cvsroot=gcc&only_with_tag=hammer-3_3-branch&r1=1.36.2.7&r2=1.36.2.8

Comment 15 GCC Commits 2004-05-09 23:28:02 UTC
Subject: Bug 15002

CVSROOT:	/cvs/gcc
Module name:	gcc
Branch: 	gcc-3_4-branch
Changes by:	paolo@gcc.gnu.org	2004-05-09 23:27:57

Modified files:
	libstdc++-v3   : ChangeLog 
	libstdc++-v3/include/bits: basic_string.tcc istream.tcc 
Added files:
	libstdc++-v3/testsuite/27_io/basic_istream/getline/char: 4.cc 
	                                                         5.cc 
	libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/char: 
	                                                                          10.cc 
	libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/wchar_t: 
	                                                                             10.cc 

Log message:
	2004-05-09  Paolo Carlini  <pcarlini@suse.de>
	
	* testsuite/21_strings/basic_string/inserters_extractors/char/10.cc:
	New.
	* testsuite/21_strings/basic_string/inserters_extractors/wchar_t/10.cc:
	Likewise.
	* testsuite/27_io/basic_istream/getline/char/5.cc: Likewise.
	
	2004-05-09  Paolo Carlini  <pcarlini@suse.de>
	
	PR libstdc++/15002 (continued again)
	* include/bits/istream.tcc (getline(basic_istream<>&,
	basic_string<>&, _CharT)): Use a temporary buffer, thus
	avoiding reallocation for common case.
	
	* include/bits/basic_string.tcc (_S_construct(_InIterator,
	_InIterator, const _Alloc&, input_iterator_tag)): Tweak size
	of temporary buffer to a power of two.
	
	* testsuite/27_io/basic_istream/getline/char/4.cc: Add comment.
	
	2004-05-09  Paolo Carlini  <pcarlini@suse.de>
	Petur Runolfsson  <peturr02@ru.is>
	
	PR libstdc++/15002 (continued)
	* include/bits/istream.tcc (basic_istream<>::getline(char_type*,
	streamsize, char_type)): Use traits::find/copy in a loop to speed
	up greatly the function in the common case (I/O buffer size >> 1).
	
	2004-05-09  Paolo Carlini  <pcarlini@suse.de>
	
	* testsuite/27_io/basic_istream/getline/char/4.cc: New.
	
	* include/bits/istream.tcc (getline(basic_istream<>&,
	basic_string<>&, _CharT)): Change to use sgetc()/snextc() instead
	of sbumpc(), consistently with the other functions, thus also
	dealing correctly with the case of exceeded string::max_size().

Patches:
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/ChangeLog.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.2224.2.96&r2=1.2224.2.97
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/include/bits/basic_string.tcc.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.46.4.6&r2=1.46.4.7
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/include/bits/istream.tcc.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=1.60.4.2&r2=1.60.4.3
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/testsuite/27_io/basic_istream/getline/char/4.cc.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=NONE&r2=1.2.4.1
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/testsuite/27_io/basic_istream/getline/char/5.cc.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=NONE&r2=1.1.4.1
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/char/10.cc.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=NONE&r2=1.1.4.1
http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/wchar_t/10.cc.diff?cvsroot=gcc&only_with_tag=gcc-3_4-branch&r1=NONE&r2=1.1.4.1

Comment 16 Paolo Carlini 2004-05-09 23:33:29 UTC
Fixed for 3.4.1.