Per Bothner
Sun Jan 31 23:58:00 GMT 1999

> In an attempt to formalize this proposal, I'd write:

It was not meant as a formal proposal, but let me try to make one:

Non-plain characters are any characters except ascii letter, digits,
and '_'.  In languages that allow identifiers to start with a
digit, an initial digit is also a non-plain character.  (We could
also allow doubled underscore, or initial underscore, or
underscore followed by capital letter *when intended to
be in the user rather than implementation namespace* to also
count as non-plain characters.)

The mangled name (assembly-level name, not counting possible
initial underscore on some platforms) of a C or C++ global
variable, or a C global function, or a C++ global function
declared as extern "C" is as follows (assuming there is no
asm specification):  If the name contains only plain characters,
then the mangled name is the same as the source name.
If the source name contains any non-plain characters,
the mangled name starts with a prefix "_UC" (for "universal
character"), followed by encoding for each of the characters.
Plain characters except for '_' are encoded as themselves.
An underscore followed by a lower-case letter is encoded
as itself.  Other underscores are encoded as "___".
A non-plain character CH is written as an initial underscore,
followed by the uppercased hexadecimal expansion of the
character's numeric value, with initial zeroes removed.
In other words, as if written by :
	printf ("_%X", CH);
(Note this does not limit us to 16-bit character codes.)

A C++ method is encoded as the encoding of the method name
(as described above), followed by "__", followed by mangling
of the containing class name (*not* as described above) and
parameters types.  (This is the same as the existing C++
mangling, but with a new mangling for non-ascii characters.)

When a class name needs to be mangled in a C++ mangled
method name, we use a variant of the above scheme, because
we need to distinguish class names from  primitives types
and other mangling codes.  To mangle a class name, if it
contains only plain characters, we emit the number of
characters in the name, followed by the characters of the name.
Thus class "Foo" is mangled as "3Foo".  If any of the
characters of the class name are non-plain, we emit
a "U", followed by the number of characters in the
mangling of the class name, followed by the encodings
of the characters, as given above for mangling simple names.

	--Per Bothner
Cygnus Solutions

More information about the Gcc mailing list