This is the mail archive of the
java-discuss@sources.redhat.com
mailing list for the Java project.
Re: Unicode mangling (was Re: [PATCH] Java: New C++ ABI compatibility changes.)
Jason Merrill <jason@redhat.com> writes:
> UCS2 values are encoded as '__NNNN'
> UCS4 values are encoded as '__LNNNNNNNN'
> '__' is encoded as '___'.
> '_' followed by anything else is left alone.
Would tha conflict with any other use of '__' ? I guess most of
these are at the library level and '__' becomes '___'.
> UCS2 values are encoded as '__NNNN'
> UCS4 values are encoded as '__LNNNNNNNN'
I might suggest:
UCS2 values are encoded as '__uNNNN'
UCS4 values are encoded as '__UNNNNNNNN'
This makes UCS2 characters encode longer, but with less chance of
clashes plus it is more readable for humans.
Or a variable-length encoding: '__uNNN_'.
Thus Latin-1 characters would be '__uNN_'.
--
--Per Bothner
per@bothner.com http://www.bothner.com/~per/