This is the mail archive of the libstdc++@gcc.gnu.org mailing list for the libstdc++ project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

sso tradeoffs

From: Howard Hinnant <hhinnant at apple dot com>
To: libstdc++ <libstdc++ at gcc dot gnu dot org>
Date: Wed, 2 Nov 2005 13:16:56 -0500
Subject: sso tradeoffs

Hello,

I am for the first time today poking around the extension short string implementation. I wanted to explore some of the design tradeoffs that we currently have.

The class layout currently looks like:

Allocator - space optimized
CharT*
size
union {
CharT buf[16],
capacity
}

This gives a 6 word size for string (CharT is char).

This is a respectable design, and according to Effective STL, very similar, if not a little better (imho) than implementation "D" from ESTL.

The short/long flag is a check to see if the CharT* references the internal buffer or not.

Advantages:

1.  Very fast iterator creation (simply return the CharT*).
2.  Very fast indexing (index off of the CharT*).
3.  Decent internal buffer size before switching to long strings.

Disadvantages (compared to what I'll show below):

1.  Significantly more complicated/slower swap (and thus move assign).
2.  Slower move construction.
3.  Slightly slower default ctor.

Ok, enough blowing smoke. Here's an alternative to consider:

union {
raw_memory,
long_rep,
short_rep
}
Allocator - space optimized

raw_memory is simply an array of size_type[n_words] where n_words is the number of words in the greater of long_rep or short_rep. Having this raw_memory view of the string can be handy when you know you just need to blow bytes from one location to another.

long_rep is a struct that looks like:

size_type short/long flag : 1
size_type capacity        : number of bits in size_type - 1
size_type size
CharT*    data

sizeof(long_rep) is 3 words. Except for the leading short/long flag, this is a pretty standard looking string. capacity is cut by half due to stealing a bit for the short/long flag.

short_rep is a struct that looks like:

union {
struct {
unsigned char short/long flag : 1
unsigned char size            : number of bits in unsigned char - 1
},
CharT padding // unused
}
CharT buf[min_cap]

Ok, the short_rep is a little complex looking. In a nutshell its high order bit is the long/short flag (in the same spot as in the long_rep). Next comes 7 bits of size. In the short form, size is going to be small, and 7 bits is plenty (can represent size up to 127). The padding member only comes into play when sizeof(CharT) > 1. When CharT is a char, padding is the same size as (and shares storage with) the flag/size field. Finally there is the internal buffer.

And the big question is: how big should min_cap be?

The very minimum that min_cap can be is 1. Rationale: The default ctor must not allocate memory. And the default ctor must store the trailing '\0'.

Consider the effects of a very large min_cap, say 128. This is obviously too big. Why? It would take way too much time to pass this thing around by value. sizeof(string) is a very important number, and even more so with move semantics looming on the horizon.

An ideal move constructor will just blow bytes from source to target, and then zero out the source (set the source to a default ctor'd string). The cost of this operation will be proportional to sizeof (string).

An ideal move assignment will just swap bytes between source and target - same operation as swapping strings. Again the cost of this operation will be proportional to sizeof(string).

I anticipate that move construct, move assign, and swap will be performance critical operations for string in C++0X. Therefore one possible rationale for the ideal min_cap is one which makes sizeof (short_rep) == sizeof(long_rep). I.e. min_cap is set to maximize the internal buffer subject to the constraint that you only have sizeof (long_rep) bytes to work with.

On a 32 bit word / 8 bit char machine this gives the capacity of a short string 10 chars (min_cap is 11). On a 64 bit word / 8 bit char machine the short string capacity jumps to 22 (min_cap is 23).

The capacity for the short string shrinks as sizeof(CharT) grows (from this rationale). Here is a summary, sizeof is in bytes:

32 bit machine: sizeof(CharT) sizeof(basic_string<CharT>) short_capacity min_cap 1 12 10 11 2 12 4 5 4 12 1 2

64 bit machine: sizeof(CharT) sizeof(basic_string<CharT>) short_capacity min_cap 1 24 22 23 2 24 10 11 4 24 4 5

A first reaction is that these internal capacities are awfully small (I had the same reaction). But...

Moving is going to be at least twice as fast as a string with twice the sizeof. And moving is how future strings will get around as they are shuffled within vector, or within a std::algorithm, or returned by value from a function (if rvo is otherwise disabled).

Further benefits:

If the short/long flag is chosen to be 0 means short, then the string can be default constructed by just shoving zeroes into it. This is 4 non-branching inlined assembly statements (at most): load an immediate 0 into a register, then store that register to 3 memory locations (ok, so you might have to specialize on std::char_traits<scalar> for that one - but it is worth it). This is a wicked small and fast default ctor:

li         r0,0
stw        r0,56(SP)
stw        r0,60(SP)
stw        r0,64(SP)

I'm not implying that it needs to be coded in assembly, it doesn't. This is what gets generated by using the "raw memory" interface.

Disadvantages:

1. Ok, the internal buffer is small. Long strings are obviously going to occur with greater frequency. 2. Indexing is slow. The index operator must check the short/long flag to find the data buffer. 3. Iterator creation is slow for the same reason that indexing is slow. However note that iterator creation is rare compared to iterator use. Once iterators are formed, and passed to an algorithm, there is no slow down (the iterator can be a simple pointer). So if clients are encouraged to iterate instead of index when possible, disadvantages 2 & 3 tend not to be significant.

One of the big differences between the two designs is the self- referential nature of the CharT* pointer: pointing into itself in one design, and being a completely different pointer in the other. Each favors different things: one favors indexing and iterator creation, the other favors swap/move, and to a lesser extent default ctor.

<shrug> These are just my musings. I know a lot of good and hard work has been put into the current sso_string.h. And I'm confident that it will blow away the ref-counted design. But here are some further considerations...

-Howard

Follow-Ups:
- Re: sso tradeoffs
  - From: Paolo Carlini
- Re: sso tradeoffs
  - From: Paolo Carlini
- Re: sso tradeoffs
  - From: Paolo Carlini

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]