This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
template mangling (was: exception handling poll)
- To: rth at cygnus dot com
- Subject: template mangling (was: exception handling poll)
- From: Joe Buck <jbuck at synopsys dot com>
- Date: Fri, 17 Oct 97 10:35:47 PDT
- Cc: jbuck at synopsys dot com, jfc at mit dot edu, egcs at cygnus dot com
I wrote:
> > And we'll pretty much want to to get more efficient mangling of
> > template functions. Just for fun, try looking at the mangled symbols
> > generated for the methods of map<string,string> .
>
> Aiee. Tell me about it. I recently fixed a reported problem in the
> assembler with ECOFF debugging that bombed because the stab for the
> class exceeded 4k, primarily due to a 336 character class name.
I had a proposal to fix the problem a year ago, but I've had no time to
work on it and it seems I won't in the immediate future.
In case anyone else on this list is interested, I'll post it again.
(it refers to 2.7.2, so map<string,string,less<string> > is a bit
different now).
Any volunteers to hack something like this up?
-----------------------------------------------------------------------
As some of you may have noticed, I've been griping about the huge
symbols that result from use of g++. Things can be improved a lot
by revising the name mangling scheme to encode repeated types better.
The basic idea for the modification to the mangling scheme is that we add
a class list, which contains the n'th class seen so far, in left to right
order. This augments the argument list, which is already kept. A class
is "seen" when we reach the end of it, so for vector<Foo> we first see
Foo, then vector<Foo>. These are coded in the template string using some
unused key letter, such as B. We apply the repeated-argument encoding (T)
first.
There's a potential problem if the mangler and demangler don't agree
on the order in which the classes are seen, but I don't think that
this problem can arise here, since the class names should appear in
the same order in both demangled and mangled forms.
The coding is particularly effective on complex template types but
helps in other cases as well.
Here are some simple examples:
EgAndInstance::EgAndInstance(const EgAndInstance&)
was
___13EgAndInstanceRC13EgAndInstance
becomes
___13EgAndInstanceRCB0
B0 = class list entry 0.
ostream::operator<<(ostream &(*)(ostream &))
(this appears with manipulators) was
___ls__7ostreamPFR7ostream_R7ostream
becomes
___ls__7ostreamPFRB0_B0
Now here's a tricky one. This symbol appears when we use
a map<string,string,less<string> >.
rb_tree<basic_string<char, string_char_traits<char> >,
pair<basic_string<char, string_char_traits<char> > const,
basic_string<char, string_char_traits<char> > >,
select1st<pair<basic_string<char, string_char_traits<char> > const,
basic_string<char, string_char_traits<char> > >, basic_string<char,
string_char_traits<char> > >, less<basic_string<char,
string_char_traits<char> > > >::__copy_hack(void *, void *)
To make this symbol easier to understand, it is
rb_tree<string,
pair<string const, string >,
select1st< pair<string const, string >, string >,
less<string > >
::__copy_hack(void *, void *)
We will build the following class list.
B0 = string_char_traits<char>
B1 = basic_string<char,B0> = string
B2 = pair<const B1,B1>
B3 = select1st<B2,B1>
B4 = less<B1>
(only B1 and B2 are re-used)
The old mangling is
___copy_hack__t7rb_tree4Zt12basic_string2ZcZt18string_char_traits1ZcZt4pair2ZCt
12basic_string2ZcZt18string_char_traits1ZcZt12basic_string2ZcZt18string_char_traits1ZcZt9select1st2Zt4pair2ZCt12basic_string2ZcZt18string_char_traits1ZcZt12basic_string2ZcZt18string_char_traits1ZcZt12basic_string2ZcZt18string_char_traits1ZcZt4less1Zt12basic_string2ZcZt18string_char_traits1ZcPvT1
The new mangling becomes (if I did this right)
___copy_hack__t7rb_tree4Zt12basic_string2ZcZt18string_char_traits1ZcZt4pair2ZCB1B1t9select1st2ZB2B1t4less1ZB1PvT1
Something like this is going to be essential to have STL work on platforms
that limit symbol name lengths (HP is one). (Sun as has a 2048-character
limit in stabs).
------------
If something like this is developed, there should probably be a compiler
switch to enable the old or new schemes. The same cplus_demangle could
handle either scheme.