This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
Good numbers from Ritter's new string allocator
- From: Paolo Carlini <pcarlini at unitus dot it>
- To: libstdc++ at gcc dot gnu dot org
- Cc: bkoz at redhat dot com, stefan at noname4s dot com, rittle at latour dot rsch dot comm dot mot dot com
- Date: Thu, 22 Nov 2001 19:49:12 +0100
- Subject: Good numbers from Ritter's new string allocator
Hi,
today I have begun benchmarking on my PII-400 (Linux.2.4.10, glibc2.2.4) Loren James
Rittle's proposal for an improved basic_string memory allocation:
http://gcc.gnu.org/ml/libstdc++/2001-07/msg00084.html
The first numbers are from Stefan's test, for 100000 concatenations:
/////////
#include <string>
#define SIZE 100000
using namespace std;
int main()
{
string s;
int i;
for( i=0; i<SIZE; i++ )
{
s+="a";
}
}
/////////
... and are really ***astounding*** (5 consecutive runs):
gcc version 3.1 20011122 (experimental)
---------------------------------------
18.470u 2.880s 0:21.83 97.8% 0+0k 0+0io 163pf+0w
19.230u 2.750s 0:22.49 97.7% 0+0k 0+0io 163pf+0w
18.480u 3.000s 0:21.83 98.3% 0+0k 0+0io 163pf+0w
18.790u 3.010s 0:22.14 98.4% 0+0k 0+0io 163pf+0w
19.540u 2.600s 0:22.52 98.3% 0+0k 0+0io 163pf+0w
gcc version 3.1 20011122 (experimental) + Loren's patch
(__malloc_header_size == 8)
-------------------------------------------------------
0.110u 0.000s 0:00.11 100.0% 0+0k 0+0io 162pf+0w
0.110u 0.010s 0:00.11 109.0% 0+0k 0+0io 162pf+0w
0.120u 0.000s 0:00.11 109.0% 0+0k 0+0io 162pf+0w
0.100u 0.020s 0:00.11 109.0% 0+0k 0+0io 162pf+0w
0.110u 0.000s 0:00.11 100.0% 0+0k 0+0io 162pf+0w
For 1000000 of concatenations (out of cache) the difference is not less impressive, with
unpatched gcc beyond 600 seconds and patched gcc:
1.350u 0.640s 0:02.01 99.0% 0+0k 0+0io 163pf+0w
1.290u 0.670s 0:02.01 97.5% 0+0k 0+0io 163pf+0w
1.430u 0.570s 0:02.02 99.0% 0+0k 0+0io 163pf+0w
1.380u 0.590s 0:02.02 97.5% 0+0k 0+0io 163pf+0w
1.260u 0.730s 0:02.02 98.5% 0+0k 0+0io 163pf+0w
The following are the values of the capacity() corresponding to a given value x in
reserve(x):
x capacity()
--------------------------
32 32
64 64
128 235
256 491
512 747
1024 1259
2048 2283
4096 8171
8192 12267
16384 20459
32768 36843
65536 69611
131072 135147
262144 266219
In my opionion they are absolutely reasonable, considering that it always possible to
shrink to fit the capacity to size() with a reserve() call (now it always works ;-)
So, the results are absolutely promising and I'm planning to carry out more tests for
different (more realistic?) benchmarks, soon on P4 too.
Eventually, it should also be assessed if an appropriate configure test for
__malloc_header_size is really worthy. I understand from Ulrich Drepper's message:
http://gcc.gnu.org/ml/libstdc++/2001-07/msg00077.html
that perhaps we can go with an hardwired 4*sizeof(void*). Otherwise we have to implement
an aoutconf test which Loren has already outlined:
http://gcc.gnu.org/ml/libstdc++/2001-07/msg00083.html
I'm attaching to this message a freshly made diff (vs current mainline) of Rittle's
proposal which you all may apply to carry out your own experiments (it has
__malloc_header_size == 8 == 2*sizeof(void*), the optimal value for my Linux, it should
also be ok for FreeBSD and Solaris)
Please, post your results and opinions!!
Cheers,
Paolo.
*** basic_string.tcc.orig Thu Nov 22 18:59:57 2001
--- basic_string.tcc Thu Nov 22 18:59:35 2001
*************** namespace std
*** 374,379 ****
--- 374,398 ----
// terminating null char_type() element, plus enough for the
// _Rep data structure. Whew. Seemingly so needy, yet so elemental.
size_t __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
+
+ const size_t __pagesize = 4096; // This magic constant, from OS.
+ const size_t __malloc_header_size = 8; // This one, from malloc.
+ if ((__size + __malloc_header_size) > __pagesize)
+ {
+ size_t __extra =
+ (__pagesize - ((__size + __malloc_header_size) % __pagesize))
+ % __pagesize;
+ __capacity += __extra / sizeof(_CharT);
+ __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
+ }
+ else if (__size > 128) // This magic constant is from stl_alloc.h.
+ {
+ size_t __extra =
+ (256 - ((__size + __malloc_header_size) % 256)) % 256;
+ __capacity += __extra / sizeof(_CharT);
+ __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep);
+ }
+
// NB: Might throw, but no worries about a leak, mate: _Rep()
// does not throw.
void* __place = _Raw_bytes_alloc(__alloc).allocate(__size);