This is the mail archive of the
java-discuss@sources.redhat.com
mailing list for the Java project.
Re: Unicode mangling (was Re: [PATCH] Java: New C++ ABI compatibility changes.)
Alexandre Petit-Bianco <apbianco@cygnus.com> writes:
> > What was [final U] used for?
>
> I honestly don't know. Maybe Per remembers.
To indicate that a method name uses Unicode escapes. A mangled
class name is a number followed by the name. So we can add a
'U' before the number without causing ambiguity. But we can't do
that for a method name, since the mangling of a method name just
goes right into it. Hence the 'U' is tacked to the end.
Here is the description from g++int.info. This description should be
updated for the new ABI, It could of course do so by referring to
some other document.
Mangling of simple names
------------------------
A simple class, package, template, or namespace name is encoded as
the number of characters in the name, followed by the actual
characters. Thus the class `Foo' is encoded as `3Foo'.
If any of the characters in the name are not alphanumeric (i.e not
one of the standard ASCII letters, digits, or '_'), or the initial
character is a digit, then the name is mangled as a sequence of encoded
Unicode letters. A Unicode encoding starts with a `U' to indicate that
Unicode escapes are used, followed by the number of bytes used by the
Unicode encoding, followed by the bytes representing the encoding.
ASSCI letters and non-initial digits are encoded without change.
However, all other characters (including underscore and initial digits)
are translated into a sequence starting with an underscore, followed by
the big-endian 4-hex-digit lower-case encoding of the character.
If a method name contains Unicode-escaped characters, the entire
mangled method name is followed by a `U'.
For example, the method `X\u0319::M\u002B(int)' is encoded as
`M_002b__U6X_0319iU'.
--
--Per Bothner
per@bothner.com http://www.bothner.com/~per/