This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Good numbers from Ritter's new string allocator


Hi!

We have not concluded our testing yet, but we basically get the same improvements.
Currently experimenting with "rounding" the _capacity even for strings _shorter_ than 128 bytes which should decrease memory fragmentation even further (This is of high importance to us since the application _will/must_ run for long periods of time).

Since we are using the pthread_alloc allocator in some parts of the application we are looking at some "modifications" to this as well, but as I said - I will get back with some more actual numbers and ideas...

Take care everyone!

/Stefan

Paolo Carlini wrote:

Hi,

today I have begun benchmarking on my PII-400 (Linux.2.4.10, glibc2.2.4) Loren James
Rittle's proposal for an improved basic_string memory allocation:

    http://gcc.gnu.org/ml/libstdc++/2001-07/msg00084.html

The first numbers are from Stefan's test, for 100000 concatenations:

/////////
#include <string>

#define SIZE 100000

using namespace std;

int main()
{
        string s;
        int i;

        for( i=0; i<SIZE; i++ )
        {
                s+="a";
        }
}
/////////

... and are really ***astounding*** (5 consecutive runs):

gcc version 3.1 20011122 (experimental)
---------------------------------------
18.470u 2.880s 0:21.83 97.8%    0+0k 0+0io 163pf+0w
19.230u 2.750s 0:22.49 97.7%    0+0k 0+0io 163pf+0w
18.480u 3.000s 0:21.83 98.3%    0+0k 0+0io 163pf+0w
18.790u 3.010s 0:22.14 98.4%    0+0k 0+0io 163pf+0w
19.540u 2.600s 0:22.52 98.3%    0+0k 0+0io 163pf+0w

gcc version 3.1 20011122 (experimental) + Loren's patch
(__malloc_header_size == 8)
-------------------------------------------------------
0.110u 0.000s 0:00.11 100.0%    0+0k 0+0io 162pf+0w
0.110u 0.010s 0:00.11 109.0%    0+0k 0+0io 162pf+0w
0.120u 0.000s 0:00.11 109.0%    0+0k 0+0io 162pf+0w
0.100u 0.020s 0:00.11 109.0%    0+0k 0+0io 162pf+0w
0.110u 0.000s 0:00.11 100.0%    0+0k 0+0io 162pf+0w

For 1000000 of concatenations (out of cache) the difference is not less impressive, with
unpatched gcc beyond 600 seconds and patched gcc:

1.350u 0.640s 0:02.01 99.0%     0+0k 0+0io 163pf+0w
1.290u 0.670s 0:02.01 97.5%     0+0k 0+0io 163pf+0w
1.430u 0.570s 0:02.02 99.0%     0+0k 0+0io 163pf+0w
1.380u 0.590s 0:02.02 97.5%     0+0k 0+0io 163pf+0w
1.260u 0.730s 0:02.02 98.5%     0+0k 0+0io 163pf+0w

The following are the values of the capacity() corresponding to a given value x in
reserve(x):

      x           capacity()
  --------------------------
     32                   32
     64                   64
    128                  235
    256                  491
    512                  747
   1024                 1259
   2048                 2283
   4096                 8171
   8192                12267
  16384                20459
  32768                36843
  65536                69611
 131072               135147
 262144               266219

In my opionion they are absolutely reasonable, considering that it always possible to
shrink to fit the capacity to size() with a reserve() call (now it always works ;-)

So, the results are absolutely promising and I'm planning to carry out more tests for
different (more realistic?) benchmarks, soon on P4 too.

Eventually, it should also be assessed if an appropriate configure test for
__malloc_header_size is really worthy. I understand from Ulrich Drepper's message:

    http://gcc.gnu.org/ml/libstdc++/2001-07/msg00077.html

that perhaps we can go with an hardwired 4*sizeof(void*). Otherwise we have to implement
an aoutconf test which Loren has already outlined:

    http://gcc.gnu.org/ml/libstdc++/2001-07/msg00083.html

I'm attaching to this message a freshly made diff (vs current mainline) of Rittle's
proposal which you all may apply to carry out your own experiments (it has
__malloc_header_size == 8 == 2*sizeof(void*), the optimal value for my Linux, it should
also be ok for FreeBSD and Solaris)

Please, post your results and opinions!!

Cheers,
Paolo.

*** basic_string.tcc.orig Thu Nov 22 18:59:57 2001
--- basic_string.tcc Thu Nov 22 18:59:35 2001
*************** namespace std
*** 374,379 ****
--- 374,398 ----
        // terminating null char_type() element, plus enough for the
        // _Rep data structure. Whew. Seemingly so needy, yet so elemental.
        size_t __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
+
+       const size_t __pagesize = 4096; // This magic constant, from OS.
+       const size_t __malloc_header_size = 8; // This one, from malloc.
+       if ((__size + __malloc_header_size) > __pagesize)
+       {
+         size_t __extra =
+           (__pagesize - ((__size + __malloc_header_size) % __pagesize))
+           % __pagesize;
+         __capacity += __extra / sizeof(_CharT);
+         __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
+       }
+       else if (__size > 128) // This magic constant is from stl_alloc.h.
+       {
+         size_t __extra =
+           (256 - ((__size + __malloc_header_size) % 256)) % 256;
+         __capacity += __extra / sizeof(_CharT);
+         __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
+       }
+
        // NB: Might throw, but no worries about a leak, mate: _Rep()
        // does not throw.
        void* __place = _Raw_bytes_alloc(__alloc).allocate(__size);

-- 
Military intelligence is a contradiction in terms.
 
Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]