This is the mail archive of the
mailing list for the Java project.
Re: Unicode mangling (was Re: [PATCH] Java: New C++ ABI compatibility changes.)
>>>>> "Per" == Per Bothner <firstname.lastname@example.org> writes:
> Jason Merrill <email@example.com> writes:
>> UCS2 values are encoded as '__NNNN'
>> UCS4 values are encoded as '__LNNNNNNNN'
>> '__' is encoded as '___'.
>> '_' followed by anything else is left alone.
> Would tha conflict with any other use of '__' ? I guess most of
> these are at the library level and '__' becomes '___'.
Hmm...yes, the library entry points specified by the ABI use __. Hmph.
Well, those uses are always followed by a lower-case letter, as I would
expect all real names containing __ to, so perhaps we could just reserve __U.
> Or a variable-length encoding: '__uNNN_'.
> Thus Latin-1 characters would be '__uNN_'.
Seems reasonable. So:
All extended characters are encoded as '__UNNN_' (between 2 and 8 Ns)
'__U' is encoded as '__U55_'