This is the mail archive of the
mailing list for the Java project.
Re: Unicode mangling (was Re: [PATCH] Java: New C++ ABI compatibility changes.)
>>>>> "Alexandre" == Alexandre Petit-Bianco <email@example.com> writes:
> Jason Merrill writes:
>> 1) It doesn't allow for C-like symbols, which have no length specifier.
>> This could be fixed by defining some encoding starting with, say, '_U'.
>> 2) It doesn't accommodate 32-bit extended characters in C++/C99
>> (\UNNNNNNNN). This could be fixed by escaping them with, say, '_L'.
>> 3) _NNNN is a valid component of an identifier, complicating the
>> demangler intelligence. This could be fixed by also escaping the '_'
>> character in affected names. Hmm...it looks like you intend to do
>> so in unicode_mangling_length, but don't actually do so in
>> append_unicode_mangled_name. We could also just use '__'.
> So you basically suggest that __UNNNN be emitted for every unicode
> characters that we encounter. __LNNNNNNNN would be emited for 32-bits
> extended characters (Java doesn't have to worry about it.)
I meant _NNNN and _LNNNNNNNN, actually, with a literal _ encoded as __.
Only the last would actually require a change in the Java frontend.
> And Java would be dropping the `U' at the end of the symbol too.
What was that used for?
>> With these fixes, I think the current scheme is OK. But for targets
>> with 8-bit clean binutils, I think it makes a lot of sense to just
>> use the UTF8 encoding in the symbol.
> That's fine too, but requires coordinated changes in binutils.
Does it? Having output filters on nm and such to convert from UTF8 to the
current locale's encoding would be good, but not strictly necessary.