sso tradeoffs

Howard Hinnant hhinnant@apple.com
Wed Nov 2 18:17:00 GMT 2005


Hello,

I am for the first time today poking around the extension short  
string implementation.  I wanted to explore some of the design  
tradeoffs that we currently have.

The class layout currently looks like:

Allocator - space optimized
CharT*
size
union {
CharT buf[16],
capacity
}

This gives a 6 word size for string (CharT is char).

This is a respectable design, and according to Effective STL, very  
similar, if not a little better (imho) than implementation "D" from  
ESTL.

The short/long flag is a check to see if the CharT* references the  
internal buffer or not.

Advantages:

1.  Very fast iterator creation (simply return the CharT*).
2.  Very fast indexing (index off of the CharT*).
3.  Decent internal buffer size before switching to long strings.

Disadvantages (compared to what I'll show below):

1.  Significantly more complicated/slower swap (and thus move assign).
2.  Slower move construction.
3.  Slightly slower default ctor.

Ok, enough blowing smoke.  Here's an alternative to consider:

union {
raw_memory,
long_rep,
short_rep
}
Allocator - space optimized

raw_memory is simply an array of size_type[n_words] where n_words is  
the number of words in the greater of long_rep or short_rep.  Having  
this raw_memory view of the string can be handy when you know you  
just need to blow bytes from one location to another.

long_rep is a struct that looks like:

size_type short/long flag : 1
size_type capacity        : number of bits in size_type - 1
size_type size
CharT*    data

sizeof(long_rep) is 3 words.  Except for the leading short/long flag,  
this is a pretty standard looking string.  capacity is cut by half  
due to stealing a bit for the short/long flag.

short_rep is a struct that looks like:

union {
struct {
unsigned char short/long flag : 1
unsigned char size            : number of bits in unsigned char - 1
},
CharT padding // unused
}
CharT buf[min_cap]

Ok, the short_rep is a little complex looking.  In a nutshell its  
high order bit is the long/short flag (in the same spot as in the  
long_rep).  Next comes 7 bits of size.  In the short form, size is  
going to be small, and 7 bits is plenty (can represent size up to  
127).  The padding member only comes into play when sizeof(CharT) >  
1.  When CharT is a char, padding is the same size as (and shares  
storage with) the flag/size field.  Finally there is the internal  
buffer.

And the big question is:  how big should min_cap be?

The very minimum that min_cap can be is 1.  Rationale:  The default  
ctor must not allocate memory.  And the default ctor must store the  
trailing '\0'.

Consider the effects of a very large min_cap, say 128.  This is  
obviously too big.  Why?  It would take way too much time to pass  
this thing around by value.  sizeof(string) is a very important  
number, and even more so with move semantics looming on the horizon.

An ideal move constructor will just blow bytes from source to target,  
and then zero out the source (set the source to a default ctor'd  
string).  The cost of this operation will be proportional to sizeof 
(string).

An ideal move assignment will just swap bytes between source and  
target - same operation as swapping strings.  Again the cost of this  
operation will be proportional to sizeof(string).

I anticipate that move construct, move assign, and swap will be  
performance critical operations for string in C++0X.  Therefore one  
possible rationale for the ideal min_cap is one which makes sizeof 
(short_rep) == sizeof(long_rep).  I.e. min_cap is set to maximize the  
internal buffer subject to the constraint that you only have sizeof 
(long_rep) bytes to work with.

On a 32 bit word / 8 bit char machine this gives the capacity of a  
short string 10 chars (min_cap is 11).
On a 64 bit word / 8 bit char machine the short string capacity jumps  
to 22 (min_cap is 23).

The capacity for the short string shrinks as sizeof(CharT) grows  
(from this rationale).  Here is a summary, sizeof is in bytes:

32 bit machine:
sizeof(CharT)     sizeof(basic_string<CharT>)   short_capacity       
min_cap
     1                       12                        
10               11
     2                       12                         
4                5
     4                       12                         
1                2

64 bit machine:
sizeof(CharT)     sizeof(basic_string<CharT>)   short_capacity       
min_cap
     1                       24                        
22               23
     2                       24                        
10               11
     4                       24                         
4                5

A first reaction is that these internal capacities are awfully small  
(I had the same reaction).  But...

Moving is going to be at least twice as fast as a string with twice  
the sizeof.  And moving is how future strings will get around as they  
are shuffled within vector, or within a std::algorithm, or returned  
by value from a function (if rvo is otherwise disabled).

Further benefits:

If the short/long flag is chosen to be 0 means short, then the string  
can be default constructed by just shoving zeroes into it.  This is 4  
non-branching inlined assembly statements (at most):  load an  
immediate 0 into a register, then store that register to 3 memory  
locations (ok, so you might have to specialize on  
std::char_traits<scalar> for that one - but it is worth it).  This is  
a wicked small and fast default ctor:

li         r0,0
stw        r0,56(SP)
stw        r0,60(SP)
stw        r0,64(SP)

I'm not implying that it needs to be coded in assembly, it doesn't.   
This is what gets generated by using the "raw memory" interface.

Disadvantages:

1.  Ok, the internal buffer is small.  Long strings are obviously  
going to occur with greater frequency.
2.  Indexing is slow.  The index operator must check the short/long  
flag to find the data buffer.
3.  Iterator creation is slow for the same reason that indexing is  
slow.  However note that iterator creation is rare compared to  
iterator use.  Once iterators are formed, and passed to an algorithm,  
there is no slow down (the iterator can be a simple pointer).  So if  
clients are encouraged to iterate instead of index when possible,  
disadvantages 2 & 3 tend not to be significant.

One of the big differences between the two designs is the self- 
referential nature of the CharT* pointer:  pointing into itself in  
one design, and being a completely different pointer in the other.   
Each favors different things:  one favors indexing and iterator  
creation, the other favors swap/move, and to a lesser extent default  
ctor.

<shrug>  These are just my musings.  I know a lot of good and hard  
work has been put into the current sso_string.h.  And I'm confident  
that it will blow away the ref-counted design.  But here are some  
further considerations...

-Howard



More information about the Libstdc++ mailing list