This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
basic_string capacity strategy < 128 characters
- From: Joachim Schurig <js at logoport dot net>
- To: libstdc++ at gcc dot gnu dot org
- Date: Wed, 11 Jun 2003 11:55:00 +0200
- Subject: basic_string capacity strategy < 128 characters
Hi,
during a tedious investigation concerning bad runtime behaviour of a
multithreaded server application on an Intel Quad SMP system, I realized
that parts of the bad allocator response (using the standard
__pool_allocator) were due to the heavy use of the allocator for strings.
In fact, the current basic_string implementation in libstdc++ allocates
a new string buffer with single increment for every added character, if
the string length is < 128.
On a Quad SMP machine, this leads to complete congestion with only 3-4
running threads creating and appending strings.
A simple fix is attached, it more or less reimplements the linear
capacity strategy of libstdc++-v2:
Minimum buffer size is 16 bytes (still inefficient, as for chars this
only leaves 3 chars effective buffer), doubles the buffer size if
needed, and respects malloc header space of 4*sizeof(void). This
algorithm doubles the total representation size, not the real string
capacity, as this fits best in __mt_alloc's fixed allocator bins.
I read many of the earlier comments, and am aware of a comment that
delegated responsibility for better runtime behaviour to the programmer
(who should use appropriate reserve() calls), but there are many
situations left where you cannot reserve() in advance (just because you
do not know how much).
As the standard requires a linear behaviour, and as it was implemented
likewise in libstdc++-v2 and lots of code is written expecting such, you
might want to rethink the current strategy (or clarify why it was chosen
the way it is for -v3).
BTW, in the gcc/libstdc++ documentation, there is a paragraph comparing
MS CString vs. libstdc++ string, pointing out how badly CString's
capacity strategy is chosen (simply incrementing by 1) compared to the
string implementation of libstdc++. As far as I am aware of, the CString
implementation was changed time ago, and now the picture is exactly the
other way round.
Best regards,
Joachim Schurig
--- libstdc++-v3/include/bits/basic_string.tcc.orig 2003-01-01 13:04:23.000000000 +0100
+++ libstdc++-v3/include/bits/basic_string.tcc 2003-05-06 14:53:38.000000000 +0200
@@ -293,21 +293,11 @@
{
// Must reallocate.
allocator_type __a = get_allocator();
- // See below (_S_create) for the meaning and value of these
- // constants.
- const size_type __pagesize = 4096;
- const size_type __malloc_header_size = 4 * sizeof (void*);
- // The biggest string which fits in a memory page
- const size_type __page_capacity = (__pagesize - __malloc_header_size
- - sizeof(_Rep) - sizeof(_CharT))
- / sizeof(_CharT);
+
_Rep* __r;
- if (__new_size > capacity() && __new_size > __page_capacity)
- // Growing exponentially.
- __r = _Rep::_S_create(__new_size > 2*capacity() ?
- __new_size : 2*capacity(), __a);
- else
- __r = _Rep::_S_create(__new_size, __a);
+ // let _S_create figure out the actual capacity
+ __r = _Rep::_S_create(__new_size, __a);
+
try
{
if (__pos)
@@ -390,53 +380,25 @@
if (__capacity == npos)
#endif
__throw_length_error("basic_string::_S_create");
+
+ const size_t __pagesize = 4096;
+ const size_t __malloc_header_size = 4 * sizeof (void*);
- // NB: Need an array of char_type[__capacity], plus a
- // terminating null char_type() element, plus enough for the
- // _Rep data structure. Whew. Seemingly so needy, yet so elemental.
+ // the terminating 0 + the internal representation
size_t __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
-
- // The standard places no restriction on allocating more memory
- // than is strictly needed within this layer at the moment or as
- // requested by an explicit application call to reserve(). Many
- // malloc implementations perform quite poorly when an
- // application attempts to allocate memory in a stepwise fashion
- // growing each allocation size by only 1 char. Additionally,
- // it makes little sense to allocate less linear memory than the
- // natural blocking size of the malloc implementation.
- // Unfortunately, we would need a somewhat low-level calculation
- // with tuned parameters to get this perfect for any particular
- // malloc implementation. Fortunately, generalizations about
- // common features seen among implementations seems to suffice.
-
- // __pagesize need not match the actual VM page size for good
- // results in practice, thus we pick a common value on the low
- // side. __malloc_header_size is an estimate of the amount of
- // overhead per memory allocation (in practice seen N * sizeof
- // (void*) where N is 0, 2 or 4). According to folklore,
- // picking this value on the high side is better than
- // low-balling it (especially when this algorithm is used with
- // malloc implementations that allocate memory blocks rounded up
- // to a size which is a power of 2).
- const size_t __pagesize = 4096; // must be 2^i * __subpagesize
- const size_t __subpagesize = 128; // should be >> __malloc_header_size
- const size_t __malloc_header_size = 4 * sizeof (void*);
- if ((__size + __malloc_header_size) > __pagesize)
- {
- size_t __extra =
- (__pagesize - ((__size + __malloc_header_size) % __pagesize))
- % __pagesize;
- __capacity += __extra / sizeof(_CharT);
- __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
- }
- else if (__size > __subpagesize)
- {
- size_t __extra =
- (__subpagesize - ((__size + __malloc_header_size) % __subpagesize))
- % __subpagesize;
- __capacity += __extra / sizeof(_CharT);
- __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
- }
+ if (__size < 16) {
+ __size = 16;
+ } else if (__size <= __pagesize / 2) {
+ // fit in powers of 2 to ease reuse of buffers
+ size_t __tsize = 16;
+ while (__tsize < __size) __tsize <<= 1;
+ __size = __tsize;
+ } else {
+ size_t __tsize = __pagesize;
+ while (__tsize - __malloc_header_size < __size) __tsize <<= 1;
+ __size = __tsize - __malloc_header_size;
+ }
+ __capacity = (__size - sizeof(_Rep)) / sizeof(_CharT) - 1;
// NB: Might throw, but no worries about a leak, mate: _Rep()
// does not throw.
@@ -455,20 +417,10 @@
{
// Requested capacity of the clone.
const size_type __requested_cap = _M_length + __res;
- // See above (_S_create) for the meaning and value of these constants.
- const size_type __pagesize = 4096;
- const size_type __malloc_header_size = 4 * sizeof (void*);
- // The biggest string which fits in a memory page.
- const size_type __page_capacity =
- (__pagesize - __malloc_header_size - sizeof(_Rep) - sizeof(_CharT))
- / sizeof(_CharT);
+
_Rep* __r;
- if (__requested_cap > _M_capacity && __requested_cap > __page_capacity)
- // Growing exponentially.
- __r = _Rep::_S_create(__requested_cap > 2*_M_capacity ?
- __requested_cap : 2*_M_capacity, __alloc);
- else
- __r = _Rep::_S_create(__requested_cap, __alloc);
+ // let _S_create figure out the capacity
+ __r = _Rep::_S_create(__requested_cap, __alloc);
if (_M_length)
{