This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Rope vs string performance
- From: Matt Austern <austern at apple dot com>
- To: Benjamin Kosnik <bkoz at redhat dot com>, Paolo Carlini <pcarlini at suse dot de>
- Cc: libstdc++ <libstdc++ at gcc dot gnu dot org>
- Date: Sun, 25 Jan 2004 17:25:10 -0800
- Subject: Rope vs string performance
- References: <3FF0020A.7060804@suse.de>
On Dec 29, 2003, at 2:29 AM, Paolo Carlini wrote:
Hi everyone,
citing Matt:
"... And if you find it too much of a nuisance, I can just go ahead
with it."
Indeed, turned out to be a nuisance ;), not only due to the use of 4
additional
allocators, but also because of many static members in the
implementation.
The below is what I have ready, it works (make check + a few simple
tests from
the SGI STL site), seems to me quite clean, but I'm not sure if it's
the best
we can do, honestly.
Well, perhaps, 'rope' isn't really used that much these days and we
had better
spending our time elsewhere...
I think this is fine. And thanks for doing this; I've been distracted
by other
things in the compiler for the last few weeks, slowly getting back to
doing library work again.
My one useful contribution to this discussion: I think relatively
minimal
effort on rope is fine, since I'm sure you're right that it isn't being
used
heavily, but I'd hate to see the code die entirely. There are usage
patterns where rope is *much* faster than std:;string, i.e. faster by
one or two orders of magnitude. We should keep the code alive
and working, instead of throwing it away, in the hope that someday
we can put these performance improvements into a form where
they're more widely used.
Basically, rope shines when you're dealing with huge strings and when
you're doing a whole lot of substring operations and string
concatenation. (It doesn't do nearly as well for random access to
characters.) I claim that this is one important usage pattern.
I'm posting a test program below. It's pretty simpleminded, and you
might
reasonably say that it's unfair because I've deliberately written it to
show
rope at its best and string at its worst, but again, my claim isn't
that rope
is better uniformly, only that there are some cases where it's far
better.
On my system, the output from this program is:
[tmp]$ /work/root/bin/g++ -O3 rtime.cc
[tmp]$ ./a.out
String length: 50000000, sample: s had
Rope length: 50000000, sample: s had
String: 0.84 0.45
Rope: 0.16 0.01
#include <iostream>
#include <string>
#include <ext/rope>
#include <time.h>
const char base[] =
"Happy families are all alike; every unhappy family is unhappy in \
its own way. \
\
Everything was in confusion in the Oblonskys' house. The wife \
had discovered that the husband was carrying on an intrigue with \
a French girl, who had been a governess in their family, and she \
had announced to her husband that she could not go on living in \
the same house with him. This position of affairs had now lasted \
three days, and not only the husband and wife themselves, but all \
the members of their family and household, were painfully \
conscious of it. Every person in the house felt that there was \
so sense in their living together, and that the stray people \
brought together by chance in any inn had more in common with one \
another than they, the members of the family and household of the \
Oblonskys. The wife did not leave her own room, the husband had \
not been at home for three days. The children ran wild all over \
the house; the English governess quarreled with the housekeeper, \
and wrote to a friend asking her to look out for a new situation \
for her; the man-cook had walked off the day before just at \
dinner time; the kitchen-maid, and the coachman had given \
warning."
;
class timer
{
public:
timer() { this->start(); }
void start() { begin = clock(); }
double elapsed() {
clock_t end = clock();
return double(end - begin) / double(CLOCKS_PER_SEC);
}
private:
clock_t begin;
};
int baselen = sizeof(base) - 1;
template <class StringType>
StringType
multiply (const StringType& s, int n)
{
StringType result;
while (n > 0) {
result += s;
--n;
}
return result;
}
template <class StringType>
StringType
mung_substrings (const StringType& s, int len, int n, int skip)
{
StringType result;
int start = 0;
while (n > 0) {
StringType tmp = s.substr (start, len);
result += tmp;
--n;
start += skip;
}
return result;
}
int main()
{
timer T;
double rope_construct, string_construct, rope_substring,
string_substring;
T.start();
std::string s;
s = multiply (std::string(base), 100000);
string_construct = T.elapsed();
T.start();
__gnu_cxx::crope r;
r = multiply (__gnu_cxx::crope(base), 100000);
rope_construct = T.elapsed();
T.start();
std::string s1;
s1 = mung_substrings (s, 100000, 500, 73);
string_substring = T.elapsed();
T.start();
__gnu_cxx::crope r1;
r1 = mung_substrings (r, 100000, 500, 73);
rope_substring = T.elapsed();
std::cout << "String length: " << s1.size() << ", sample: " <<
s1.substr (88888, 6)
<< std::endl;
std::cout << "Rope length: " << r1.size() << ", sample: " <<
r1.substr (88888, 6)
<< std::endl;
std::cout << "String:\t" << string_construct << "\t" <<
string_substring << std::endl;
std::cout << "Rope:\t" << rope_construct << "\t" << rope_substring <<
std::endl;
}