This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Unicode mangling (was Re: [PATCH] Java: New C++ ABI compatibility changes.)
- To: apbianco at cygnus dot com
- Subject: Unicode mangling (was Re: [PATCH] Java: New C++ ABI compatibility changes.)
- From: Jason Merrill <jason at redhat dot com>
- Date: 15 Jan 2001 12:12:33 +0000
- Cc: gcc-patches at gcc dot gnu dot org, java-discuss at sources dot redhat dot com
- References: <200101150758.XAA16134@deliverance.cygnus.com>
It looks like you're still using the same scheme for mangling Unicode
strings. I'd like to reexamine that, since we're stabilizing the ABI.
First, we need to remember that C99 and C++ (will) also need this
functionality. The frontend work to support extended characters in
identifiers remains to be done, of course.
As far as I can tell, the current scheme affects individual identifiers.
If it contains extended characters, you prepend 'U' to the length and
replace each extended character with _NNNN (the 16-bit hex encoding of the
UCS2 value). This currently has several flaws:
1) It doesn't allow for C-like symbols, which have no length specifier.
This could be fixed by defining some encoding starting with, say, '_U'.
2) It doesn't accommodate 32-bit extended characters in C++/C99
(\UNNNNNNNN). This could be fixed by escaping them with, say, '_L'.
3) _NNNN is a valid component of an identifier, complicating the
demangler intelligence. This could be fixed by also escaping the '_'
character in affected names. Hmm...it looks like you intend to do
so in unicode_mangling_length, but don't actually do so in
append_unicode_mangled_name. We could also just use '__'.
With these fixes, I think the current scheme is OK. But for targets with
8-bit clean binutils, I think it makes a lot of sense to just use the UTF8
encoding in the symbol.
Thoughts?
Jason