This is the mail archive of the java-discuss@sources.redhat.com mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Unicode mangling (was Re: [PATCH] Java: New C++ ABI compatibility changes.)


>>>>> "Per" == Per Bothner <per@bothner.com> writes:

> Jason Merrill <jason@redhat.com> writes:
>> UCS2 values are encoded as '__NNNN'
>> UCS4 values are encoded as '__LNNNNNNNN'
>> '__' is encoded as '___'.
>> '_' followed by anything else is left alone.

> Would tha conflict with any other use of '__' ?  I guess most of
> these are at the library level and '__' becomes '___'.

Hmm...yes, the library entry points specified by the ABI use __.  Hmph.
Well, those uses are always followed by a lower-case letter, as I would
expect all real names containing __ to, so perhaps we could just reserve __U.

> Or a variable-length encoding:  '__uNNN_'.
> Thus Latin-1 characters would be '__uNN_'.

Seems reasonable.  So:

All extended characters are encoded as '__UNNN_' (between 2 and 8 Ns)
'__U' is encoded as '__U55_'

Yowza.

Jason

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]