This is the mail archive of the
libstdc++@gcc.gnu.org
mailing list for the libstdc++ project.
sso tradeoffs
- From: Howard Hinnant <hhinnant at apple dot com>
- To: libstdc++ <libstdc++ at gcc dot gnu dot org>
- Date: Wed, 2 Nov 2005 13:16:56 -0500
- Subject: sso tradeoffs
Hello,
I am for the first time today poking around the extension short
string implementation. I wanted to explore some of the design
tradeoffs that we currently have.
The class layout currently looks like:
Allocator - space optimized
CharT*
size
union {
CharT buf[16],
capacity
}
This gives a 6 word size for string (CharT is char).
This is a respectable design, and according to Effective STL, very
similar, if not a little better (imho) than implementation "D" from
ESTL.
The short/long flag is a check to see if the CharT* references the
internal buffer or not.
Advantages:
1. Very fast iterator creation (simply return the CharT*).
2. Very fast indexing (index off of the CharT*).
3. Decent internal buffer size before switching to long strings.
Disadvantages (compared to what I'll show below):
1. Significantly more complicated/slower swap (and thus move assign).
2. Slower move construction.
3. Slightly slower default ctor.
Ok, enough blowing smoke. Here's an alternative to consider:
union {
raw_memory,
long_rep,
short_rep
}
Allocator - space optimized
raw_memory is simply an array of size_type[n_words] where n_words is
the number of words in the greater of long_rep or short_rep. Having
this raw_memory view of the string can be handy when you know you
just need to blow bytes from one location to another.
long_rep is a struct that looks like:
size_type short/long flag : 1
size_type capacity : number of bits in size_type - 1
size_type size
CharT* data
sizeof(long_rep) is 3 words. Except for the leading short/long flag,
this is a pretty standard looking string. capacity is cut by half
due to stealing a bit for the short/long flag.
short_rep is a struct that looks like:
union {
struct {
unsigned char short/long flag : 1
unsigned char size : number of bits in unsigned char - 1
},
CharT padding // unused
}
CharT buf[min_cap]
Ok, the short_rep is a little complex looking. In a nutshell its
high order bit is the long/short flag (in the same spot as in the
long_rep). Next comes 7 bits of size. In the short form, size is
going to be small, and 7 bits is plenty (can represent size up to
127). The padding member only comes into play when sizeof(CharT) >
1. When CharT is a char, padding is the same size as (and shares
storage with) the flag/size field. Finally there is the internal
buffer.
And the big question is: how big should min_cap be?
The very minimum that min_cap can be is 1. Rationale: The default
ctor must not allocate memory. And the default ctor must store the
trailing '\0'.
Consider the effects of a very large min_cap, say 128. This is
obviously too big. Why? It would take way too much time to pass
this thing around by value. sizeof(string) is a very important
number, and even more so with move semantics looming on the horizon.
An ideal move constructor will just blow bytes from source to target,
and then zero out the source (set the source to a default ctor'd
string). The cost of this operation will be proportional to sizeof
(string).
An ideal move assignment will just swap bytes between source and
target - same operation as swapping strings. Again the cost of this
operation will be proportional to sizeof(string).
I anticipate that move construct, move assign, and swap will be
performance critical operations for string in C++0X. Therefore one
possible rationale for the ideal min_cap is one which makes sizeof
(short_rep) == sizeof(long_rep). I.e. min_cap is set to maximize the
internal buffer subject to the constraint that you only have sizeof
(long_rep) bytes to work with.
On a 32 bit word / 8 bit char machine this gives the capacity of a
short string 10 chars (min_cap is 11).
On a 64 bit word / 8 bit char machine the short string capacity jumps
to 22 (min_cap is 23).
The capacity for the short string shrinks as sizeof(CharT) grows
(from this rationale). Here is a summary, sizeof is in bytes:
32 bit machine:
sizeof(CharT) sizeof(basic_string<CharT>) short_capacity
min_cap
1 12
10 11
2 12
4 5
4 12
1 2
64 bit machine:
sizeof(CharT) sizeof(basic_string<CharT>) short_capacity
min_cap
1 24
22 23
2 24
10 11
4 24
4 5
A first reaction is that these internal capacities are awfully small
(I had the same reaction). But...
Moving is going to be at least twice as fast as a string with twice
the sizeof. And moving is how future strings will get around as they
are shuffled within vector, or within a std::algorithm, or returned
by value from a function (if rvo is otherwise disabled).
Further benefits:
If the short/long flag is chosen to be 0 means short, then the string
can be default constructed by just shoving zeroes into it. This is 4
non-branching inlined assembly statements (at most): load an
immediate 0 into a register, then store that register to 3 memory
locations (ok, so you might have to specialize on
std::char_traits<scalar> for that one - but it is worth it). This is
a wicked small and fast default ctor:
li r0,0
stw r0,56(SP)
stw r0,60(SP)
stw r0,64(SP)
I'm not implying that it needs to be coded in assembly, it doesn't.
This is what gets generated by using the "raw memory" interface.
Disadvantages:
1. Ok, the internal buffer is small. Long strings are obviously
going to occur with greater frequency.
2. Indexing is slow. The index operator must check the short/long
flag to find the data buffer.
3. Iterator creation is slow for the same reason that indexing is
slow. However note that iterator creation is rare compared to
iterator use. Once iterators are formed, and passed to an algorithm,
there is no slow down (the iterator can be a simple pointer). So if
clients are encouraged to iterate instead of index when possible,
disadvantages 2 & 3 tend not to be significant.
One of the big differences between the two designs is the self-
referential nature of the CharT* pointer: pointing into itself in
one design, and being a completely different pointer in the other.
Each favors different things: one favors indexing and iterator
creation, the other favors swap/move, and to a lesser extent default
ctor.
<shrug> These are just my musings. I know a lot of good and hard
work has been put into the current sso_string.h. And I'm confident
that it will blow away the ref-counted design. But here are some
further considerations...
-Howard