This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

basic_string capacity strategy < 128 characters


Hi,

during a tedious investigation concerning bad runtime behaviour of a multithreaded server application on an Intel Quad SMP system, I realized that parts of the bad allocator response (using the standard __pool_allocator) were due to the heavy use of the allocator for strings.

In fact, the current basic_string implementation in libstdc++ allocates a new string buffer with single increment for every added character, if the string length is < 128.

On a Quad SMP machine, this leads to complete congestion with only 3-4 running threads creating and appending strings.

A simple fix is attached, it more or less reimplements the linear capacity strategy of libstdc++-v2:

Minimum buffer size is 16 bytes (still inefficient, as for chars this only leaves 3 chars effective buffer), doubles the buffer size if needed, and respects malloc header space of 4*sizeof(void). This algorithm doubles the total representation size, not the real string capacity, as this fits best in __mt_alloc's fixed allocator bins.

I read many of the earlier comments, and am aware of a comment that delegated responsibility for better runtime behaviour to the programmer (who should use appropriate reserve() calls), but there are many situations left where you cannot reserve() in advance (just because you do not know how much).

As the standard requires a linear behaviour, and as it was implemented likewise in libstdc++-v2 and lots of code is written expecting such, you might want to rethink the current strategy (or clarify why it was chosen the way it is for -v3).

BTW, in the gcc/libstdc++ documentation, there is a paragraph comparing MS CString vs. libstdc++ string, pointing out how badly CString's capacity strategy is chosen (simply incrementing by 1) compared to the string implementation of libstdc++. As far as I am aware of, the CString implementation was changed time ago, and now the picture is exactly the other way round.

Best regards,

Joachim Schurig


--- libstdc++-v3/include/bits/basic_string.tcc.orig	2003-01-01 13:04:23.000000000 +0100
+++ libstdc++-v3/include/bits/basic_string.tcc	2003-05-06 14:53:38.000000000 +0200
@@ -293,21 +293,11 @@
 	{
 	  // Must reallocate.
 	  allocator_type __a = get_allocator();
-	  // See below (_S_create) for the meaning and value of these
-	  // constants.
-	  const size_type __pagesize = 4096;
-	  const size_type __malloc_header_size = 4 * sizeof (void*);
-	  // The biggest string which fits in a memory page
-	  const size_type __page_capacity = (__pagesize - __malloc_header_size
-					     - sizeof(_Rep) - sizeof(_CharT)) 
-	    				     / sizeof(_CharT);
+
 	  _Rep* __r;
-	  if (__new_size > capacity() && __new_size > __page_capacity)
-	    // Growing exponentially.
-	    __r = _Rep::_S_create(__new_size > 2*capacity() ?
-				  __new_size : 2*capacity(), __a);
-	  else
-	    __r = _Rep::_S_create(__new_size, __a);
+	  // let _S_create figure out the actual capacity
+	  __r = _Rep::_S_create(__new_size, __a);
+
 	  try 
 	    {
 	      if (__pos)
@@ -390,53 +380,25 @@
       if (__capacity == npos)
 #endif
 	__throw_length_error("basic_string::_S_create");
+	
+      const size_t __pagesize = 4096;
+      const size_t __malloc_header_size = 4 * sizeof (void*);
 
-      // NB: Need an array of char_type[__capacity], plus a
-      // terminating null char_type() element, plus enough for the
-      // _Rep data structure. Whew. Seemingly so needy, yet so elemental.
+      // the terminating 0 + the internal representation
       size_t __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
-
-      // The standard places no restriction on allocating more memory
-      // than is strictly needed within this layer at the moment or as
-      // requested by an explicit application call to reserve().  Many
-      // malloc implementations perform quite poorly when an
-      // application attempts to allocate memory in a stepwise fashion
-      // growing each allocation size by only 1 char.  Additionally,
-      // it makes little sense to allocate less linear memory than the
-      // natural blocking size of the malloc implementation.
-      // Unfortunately, we would need a somewhat low-level calculation
-      // with tuned parameters to get this perfect for any particular
-      // malloc implementation.  Fortunately, generalizations about
-      // common features seen among implementations seems to suffice.
-
-      // __pagesize need not match the actual VM page size for good
-      // results in practice, thus we pick a common value on the low
-      // side.  __malloc_header_size is an estimate of the amount of
-      // overhead per memory allocation (in practice seen N * sizeof
-      // (void*) where N is 0, 2 or 4).  According to folklore,
-      // picking this value on the high side is better than
-      // low-balling it (especially when this algorithm is used with
-      // malloc implementations that allocate memory blocks rounded up
-      // to a size which is a power of 2).
-      const size_t __pagesize = 4096; // must be 2^i * __subpagesize
-      const size_t __subpagesize = 128; // should be >> __malloc_header_size
-      const size_t __malloc_header_size = 4 * sizeof (void*);
-      if ((__size + __malloc_header_size) > __pagesize)
-	{
-	  size_t __extra =
-	    (__pagesize - ((__size + __malloc_header_size) % __pagesize))
-	    % __pagesize;
-	  __capacity += __extra / sizeof(_CharT);
-	  __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
-	}
-      else if (__size > __subpagesize)
-	{
-	  size_t __extra =
-	    (__subpagesize - ((__size + __malloc_header_size) % __subpagesize))
-	    % __subpagesize;
-	  __capacity += __extra / sizeof(_CharT);
-	  __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
-	}
+      if (__size < 16) {
+      	__size = 16;
+      } else if (__size <= __pagesize / 2) {
+      	// fit in powers of 2 to ease reuse of buffers
+	size_t __tsize = 16;
+	while (__tsize < __size) __tsize <<= 1;
+	__size = __tsize;
+      } else {
+      	size_t __tsize = __pagesize;
+	while (__tsize - __malloc_header_size < __size) __tsize <<= 1;
+	__size = __tsize - __malloc_header_size;
+      }
+      __capacity = (__size - sizeof(_Rep)) / sizeof(_CharT) - 1;
 
       // NB: Might throw, but no worries about a leak, mate: _Rep()
       // does not throw.
@@ -455,20 +417,10 @@
     {
       // Requested capacity of the clone.
       const size_type __requested_cap = _M_length + __res;
-      // See above (_S_create) for the meaning and value of these constants.
-      const size_type __pagesize = 4096;
-      const size_type __malloc_header_size = 4 * sizeof (void*);
-      // The biggest string which fits in a memory page.
-      const size_type __page_capacity =
-        (__pagesize - __malloc_header_size - sizeof(_Rep) - sizeof(_CharT))
-        / sizeof(_CharT);
+
       _Rep* __r;
-      if (__requested_cap > _M_capacity && __requested_cap > __page_capacity)
-        // Growing exponentially.
-        __r = _Rep::_S_create(__requested_cap > 2*_M_capacity ?
-                              __requested_cap : 2*_M_capacity, __alloc);
-      else
-        __r = _Rep::_S_create(__requested_cap, __alloc);
+      // let _S_create figure out the capacity
+      __r = _Rep::_S_create(__requested_cap, __alloc);
       
       if (_M_length)
 	{

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]