Merging UTF8 constants at link time.

Anthony Green
Sun Aug 5 08:13:00 GMT 2001

I think libgcj contains at least 150k of duplicate utf8 constants (probably
class metadata for the most part).

Jakub Jelinek added some facilities to bfd this spring to merge constants at
link time.  I think it should be easy to use for merging our utf8 constants
at link time.  Jakub wrote to me:

---- cut here ---------------------------------
Current gas/ld code follows ELF spec which has just SHF_MERGE or
SHF_MERGE|SHF_STRING, where the first one means merging of constants with
constant size (sh_entsize), the latter of zero terminated strings where each
character is sh_entsize bytes long. That's most common and it would be very
ugly to add different merging modes.
What you can do though with these UTF8 constants is to compute their length
and put them into different sections according to their length, like:
.section .rodata.utf8.8,"am",@progbits,8
.align 4
foo: .word 4
.byte 0x60, 0x61, 0x62, 0x63
.section .rodata.utf8.12,"am",@progbits,12
.align 4
bar: .word 8
.byte 0xc4, 0x9b, 0x61, 0xc4, 0x9b, 0x6e, 0x6f, 0x6e
---- cut here ---------------------------------

This sounds cool.  My only tweak would be to call the sections
.rodata.jutf.*.   I'd love to see this attempted, but won't be able to get
around to it myself for a while.  Perhaps someone else would like to try?

Putting the utf8 constants in different sections should be easy.  Look in
build_utf8_ref.  I think you just need to add...

DECL_SECTION_NAME (decl) = build_string (...);


More information about the Java mailing list