Unicode mangling (was Re: [PATCH] Java: New C++ ABI compatibility changes.)

Jason Merrill jason@redhat.com
Thu Jan 18 14:37:00 GMT 2001


>>>>> "Per" == Per Bothner <per@bothner.com> writes:

> Jason Merrill <jason@redhat.com> writes:
>> UCS2 values are encoded as '__NNNN'
>> UCS4 values are encoded as '__LNNNNNNNN'
>> '__' is encoded as '___'.
>> '_' followed by anything else is left alone.

> Would tha conflict with any other use of '__' ?  I guess most of
> these are at the library level and '__' becomes '___'.

Hmm...yes, the library entry points specified by the ABI use __.  Hmph.
Well, those uses are always followed by a lower-case letter, as I would
expect all real names containing __ to, so perhaps we could just reserve __U.

> Or a variable-length encoding:  '__uNNN_'.
> Thus Latin-1 characters would be '__uNN_'.

Seems reasonable.  So:

All extended characters are encoded as '__UNNN_' (between 2 and 8 Ns)
'__U' is encoded as '__U55_'

Yowza.

Jason


More information about the Java mailing list