String pool

per@bothner.com per@bothner.com
Sun Oct 29 10:21:00 GMT 2000


"Zack Weinberg" <zackw@stanford.edu> writes:

> The original goal of this patch was to save memory by not padding
> strings to powers of two, or allocating multiple copies of the same
> string.  In my tests, we don't actually save anything, because the
> overhead of maintaining the hash table is substantial: 1.5 megs of
> strings takes roughly 3 megs of hash table.  (Consider that each hash
> entry takes three words on a 32-bit system, which is 12 bytes, which
> is comparable to the length of many identifiers; and that there's
> unavoidably many unused slots in that hash table.)

Have you considered replacing struct str_header by struct tree_identifier?
For those strings that actually are identifiers, that saves you 3 words.
Yes, you do waste some space for such strings that are not identifiers.
However, my guess the overwhelming majority of strings are actually
identifiers.  Furthermore, the wasted space for non-identifiers is
modest:  sizeof(struct tree_identifier) - sizeof(struct str_header),
which I calculate to 5-3 = 2 words extras waste for non-identifiers.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/~per/


More information about the Gcc-patches mailing list