String pool

Zack Weinberg zackw@Stanford.EDU
Sun Oct 29 11:29:00 GMT 2000

On Sun, Oct 29, 2000 at 10:06:23AM -0800, Mark Mitchell wrote:
> Zack --
>   Thanks for doing this!  I bet it's actually a bigger win on C++
> compiles, where the strings for mangled names tend to get mega-long;
> we can easily waste 1K for a mangled name.

Hm, that was a C++ compile I was testing with, although not one that
made use of STL.

>   I don't think this is quite ready to go in:
>   - We lose the warn_id_clash warning in get_identifier.  (That's
>     why I didn't convert that to use hashtab.c long ago.)

I completely forgot about that.  I'll see if I can't sneak it back in.

>   - The copyright issues with the hashtable thing are non-trivial.
>     I think we should use hashtab.c for now.  If we really think
>     we should change our hashtable implementation, we should change
>     it there.  One of the nice things over the last year or two
>     is that we've gradually reduced the number of by-hand hash
>     tables in the compiler; let's not introduce new ones now...

The copyright issue is only with the hash *function*; all the other
code was cribbed with modifications from cpphash.c.  I can easily
replace that hash function with the one cpphash.c uses, which is good
enough and may even be faster.

The initial implementation did use hashtab.c.  That incurs ~100%
*additional* space overhead and 10-25% speed penalty; hashtab.c is
simply too generic.  I'd be willing to look at restructuring it into a
macros-and-inline-functions approach like obstack.h uses (think
templated C) but that is not practical at the moment.

>   - I don't think `ggc-string.c' is a very good name, because now
>     we're not really collecting anything anymore.

True.  stringpool.c? 

>     I wonder if this could go in libiberty, replacing the `data'
>     field with a `void *'.  I bet this same functionality would be
>     useful in other GNU programs, and putting stuff in libiberty is
>     a good way of forcing it to be modular...

Might be a good idea - and see below.

>   - I'm slightly worried about strings that aren't permanent.  I
>     guess one approach is to find those later, and then do something
>     manual, but that's harder than just collecting them away.

Flipping through the code, it appears that there are a fair number of
places that do that - mostly for labels and filename strings.  To
first order, we can ignore them; they're swamped by uses for
identifiers.  A better approach for labels would be to add a
(label_ref "LPBX" 123) RTX with the same semantics as symbol_ref
except that we delay generation of the actual string to assembly
emission time.  I'm not sure what to do about filename strings.
cscope listing at end of message.

>   - We could trade time for space by using a tree, rather than a
>     hash table.  Probably not worth it, just an idea.

I was actually contemplating the use of a trie.  Daniel Berlin sent me
a paper on "dynamic level and path compressed tries" awhile back,
which he says beat hash tables both on time and space.  The paper's at .  If
we went that route, the trie implementation definitely belongs in
libiberty.  I don't have time to write it just now, though.  The nice
thing about hash tables is you can whip one up in half an hour...

>   - You can save space by putting the hash-table nodes into the same
>     obstack as the strings, and then getting rid of the `ptr' field.
>     That might make it harder to inter-operate with hashtab.c, of
>     course.

I tried that.  You then have to pad the obstack, which wastes 2.7
bytes (average) per string, and the header + index table is still
three words per entry.  So it winds up being worse.  Also the code is
uglier, you have ((const char *)ptr + sizeof(struct str_header)) all
over the place.  There *is* less copying when the table is resized,
but table resize is supposed to be a rare operation.


Functions calling this function: ggc_alloc_string

File         Function                     Line
pa.c         hppa_encode_label            5957 XSTR (sym,0) = ggc_alloc_string (newstr, len);
rs6000.c     rs6000_emit_load_toc_table   5118 symF = gen_rtx_SYMBOL_REF (Pmode, ggc_alloc_string (buf, -1));
rs6000.c     rs6000_emit_load_toc_table   5121 symL = gen_rtx_SYMBOL_REF (Pmode, ggc_alloc_string (buf, -1));
rs6000.c     rs6000_emit_load_toc_table   5136 ggc_alloc_string (toc_label_name, -1));
rs6000.c     rs6000_emit_load_toc_table   5139 symF = gen_rtx_SYMBOL_REF (Pmode, ggc_alloc_string (buf, -1));
rs6000.c     rs6000_emit_load_toc_table   5156 realsym = gen_rtx_SYMBOL_REF (Pmode, ggc_alloc_string (buf, -1));
rs6000.c     create_TOC_reference         5214 ggc_alloc_string (toc_label_name, -1)))));
rs6000.c     rs6000_emit_prologue         5617 alloc_rname = ggc_alloc_string (rname, -1);
rs6000.c     rs6000_emit_epilogue         6065 alloc_rname = ggc_alloc_string (rname, -1);
rs6000.c     rs6000_encode_section_info   7587 XSTR (sym_ref, 0) = ggc_alloc_string (str, len1 + len2);
rs6000.c     rs6000_encode_section_info   7631 XSTR (sym_ref, 0) = ggc_alloc_string (str, len);
java/lex.c   init_parse                    711 internal_filename = ggc_alloc_string (INTERNAL_FILENAME,
cp/lex.c     handle_pragma_implementation 1222 ifiles->filename = ggc_alloc_string (main_filename, -1);
except.c     create_rethrow_ref            507 ptr = ggc_alloc_string (buf, -1);
f/lex.c      ffelex_hash_                 1320 input_filename = ggc_alloc_string (ffelex_token_text (token), -1);
ggc-string.c init_stringpool               100 empty_string = ggc_alloc_string ("", 0);
ggc.h        ggc_strdup                    146 #define ggc_strdup(S) ggc_alloc_string((S), -1)
optabs.c     init_libfuncs                4483 = gen_rtx_SYMBOL_REF (Pmode, ggc_alloc_string (libfunc_name,
profile.c    init_edge_profiler           1030 profiler_label = gen_rtx_SYMBOL_REF (Pmode, ggc_alloc_string (buf, -1));
profile.c    output_func_start_profiler   1128 ggc_alloc_string (buf, -1)));
profile.c    output_func_start_profiler   1130 (Pmode, ggc_alloc_string ("__bb_init_func", 14)), 0,
stmt.c       init_stmt                     609 digit_strings[i] = ggc_alloc_string (buf, 1);
stmt.c       expand_asm_operands          1459 constraint = ggc_alloc_string (buf, c_len);
toplev.c     compile_file                 2228 name = ggc_alloc_string (name, strlen (name));
tree.c       init_tree_codes               263 = ggc_alloc_string (BUILT_IN_FILENAME, sizeof (BUILT_IN_FILENAME));
tree.c       build_string                  750 TREE_STRING_POINTER (s) = ggc_alloc_string (str, len);
varasm.c     named_section                 317 in_named_name = ggc_alloc_string (name, -1);
varasm.c     assemble_static_space        1806 namestring = ggc_alloc_string (name, -1);
varasm.c     assemble_trampoline_template 1862 name = ggc_alloc_string (label, -1);
varasm.c     output_constant_def          3094 desc->label = ggc_alloc_string (label, -1);
varasm.c     force_const_mem              3606 desc->label = found = ggc_alloc_string (label, -1);

Functions calling this function: build_string

File          Function                           Line
c-common.c    combine_strings                      414 value = build_string (length, p);
c-common.c    c_expand_builtin_printf             6071 arglist = combine_strings (build_string (newlen, newstr));
c-decl.c      c_make_fname_decl                   3287 init = build_string (length + 1, name);
c-lex.c       lex_string                          2437 value = build_string (q - buf, buf);
c-parse.c     _yylex                              4323 yylval.ttype = build_string (TREE_STRING_LENGTH (stringval),
decl.c        build_chill_exception_decl          4895 ex_init = build_string (name_len, name);
grant.c       set_default_grant_file              2716 default_grant_file = build_string (strlen (tmp), tmp);
lex.c         build_chill_string                   853 t = build_string (len, str);
typeck.c      extract_constant_from_buffer         906 value = build_string (size, buffer);
conix-elf.h   UNIQUE_SECTION                       162 DECL_SECTION_NAME (DECL) = build_string (len, string); \
linux-elf.h   UNIQUE_SECTION                       249 DECL_SECTION_NAME (DECL) = build_string (len, string); \  
pe.c          arm_pe_unique_section                366 DECL_SECTION_NAME (decl) = build_string (len, string);
unknown-elf.h UNIQUE_SECTION                       180 DECL_SECTION_NAME (DECL) = build_string (len, string); \
avr.c         unique_section                      4446 DECL_SECTION_NAME (decl) = build_string (len, string);
avr.c         encode_section_info                 4668 DECL_SECTION_NAME (decl) = build_string (strlen (dsec), dsec);
elfos.h       UNIQUE_SECTION                       383 DECL_SECTION_NAME (DECL) = build_string (len, string); \
h8300.c       h8300_valid_machine_decl_attribute  3049 DECL_SECTION_NAME (decl) = build_string (7, ".eight");
h8300.c       h8300_valid_machine_decl_attribute  3061 DECL_SECTION_NAME (decl) = build_string (6, ".tiny");
djgpp.h       UNIQUE_SECTION                       294 DECL_SECTION_NAME (DECL) = build_string (len, string); \
i386.h        MD_ASM_CLOBBERS                     1156 (CLOBBERS) = tree_cons (NULL_TREE, build_string (5, "flags"),
i386.h        MD_ASM_CLOBBERS                     1157 (CLOBBERS) = tree_cons (NULL_TREE, build_string (4, "fpsr"),
                                                       (CLOBBERS)); \
i386.h        MD_ASM_CLOBBERS                     1158 (CLOBBERS) = tree_cons (NULL_TREE, build_string (7, "dirflag"),
                                                       (CLOBBERS)); \       
interix.c     i386_pe_unique_section               111 DECL_SECTION_NAME (decl) = build_string (len, string);
winnt.c       i386_pe_unique_section               498 DECL_SECTION_NAME (decl) = build_string (len, string);
mcore.c       mcore_unique_section                3582 DECL_SECTION_NAME (decl) = build_string (len, string);
elf.h         UNIQUE_SECTION                       273 DECL_SECTION_NAME (DECL) = build_string (len, string); \
elf64.h       UNIQUE_SECTION                       254 DECL_SECTION_NAME (DECL) = build_string (len, string); \
iris6gld.h    UNIQUE_SECTION                        90 DECL_SECTION_NAME (DECL) = build_string (len, string); \
mips.c        build_mips16_function_stub          8342 DECL_SECTION_NAME (stubdecl) = build_string (strlen (secname),
mips.c        build_mips16_call_stub              8584 DECL_SECTION_NAME (stubdecl) = build_string (strlen (secname),
pa-64.h       UNIQUE_SECTION                       347 DECL_SECTION_NAME (DECL) = build_string (len, string); \
aix.h         UNIQUE_SECTION                       618 DECL_SECTION_NAME (DECL) = build_string (len, string); \       
rs6000.c      rs6000_unique_section               7555 DECL_SECTION_NAME (decl) = build_string (len, string);
v850.c        ghs_pragma_section                  2790 build_string (strlen (alias) + 1, alias);
v850.c        v850_set_default_decl_attr          2897 = build_string (sizeof (".sdata")-1, ".sdata");
v850.c        v850_set_default_decl_attr          2900 = build_string (sizeof (".rosdata")-1, ".rosdata");
v850.c        v850_set_default_decl_attr          2903 = build_string (sizeof (".tdata")-1, ".tdata");
v850.c        v850_set_default_decl_attr          2906 = build_string (sizeof (".zdata")-1, ".zdata");
v850.c        v850_set_default_decl_attr          2909 = build_string (sizeof (".rozdata")-1, ".rozdata");
class.c       build_vtable_entry_ref               514 s = build_tree_list (build_string (1, "s"), s);
class.c       build_vtable_entry_ref               523 i = build_tree_list (build_string (1, "i"), i);
class.c       build_vtable_entry_ref               526 build_string (sizeof(asm_stmt)-1, asm_stmt),
decl.c        cp_make_fname_decl                  6788 init = build_string (length + 1, name);
pt.c          tsubst                              6708 str = build_string (len, name);
rtti.c        tinfo_name                           384 name_string = combine_strings (build_string (strlen (name) + 1,
com.c         ffecom_subscript_check_              775 arg1 = build_string (len, var);
com.c         ffecom_subscript_check_              781 arg1 = build_string (len, array_name);
com.c         ffecom_subscript_check_              790 arg1 = build_string (len, var);
com.c         ffecom_subscript_check_              823 arg3 = build_string (len, proc);
com.c         ffecom_build_f2c_string_            1832 return build_string (i, s);
com.c         ffecom_build_f2c_string_            1850 t = build_string (i, tmp);
com.c         ffecom_char_args_x_                 2019 item = build_string (newlen,
com.c         ffecom_debug_kludge_                2646 value = build_string (len, buff);
com.c         ffecom_constantunion               10804 item = build_string (ffetarget_length_character1 (val),
com.c         ffecom_constantunion               10826 item = build_string (h.length, h.text);
com.c         ffecom_constantunion               10835 item = build_string (FLOAT_TYPE_SIZE / CHAR_TYPE_SIZE,
ste.c         ffeste_R1001                        4937 t = build_string (ffests_length (s), ffests_text (s));
class.c       build_utf8_ref                       821 string = build_string (name_len, name_ptr);
jcf-parse.c   get_constant                         390 value = build_string (str - str_ptr, str_ptr);
lex.c         yylex                               1188 java_lval->node = build_string (strlen (string), string);
lex.c         java_lex                            1188 java_lval->node = build_string (strlen (string), string);
parse.c       build_dot_class_method_invocation  11127 s = build_string (IDENTIFIER_LENGTH (sig_id),
parse.c       do_merge_string_cste               15744 return build_string (len, new);
objc-act.c    my_build_string                     1400 tree a_string = build_string (len, str);
objc-parse.c  _yylex                              5435 yylval.ttype = build_string (TREE_STRING_LENGTH (stringval),
varasm.c      UNIQUE_SECTION                       338 DECL_SECTION_NAME (DECL) = build_string (len, string); \

More information about the Gcc-patches mailing list