This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Query on UTF-32 encodings for letters


Geoffrey Keating wrote:
Robert Dewar <dewar@adacore.com> writes:


Ada 2005 requires full support for all planes of UTF-32
encoding, including the use of letters in identifiers,
including also proper upper lower case equivalence.

You might consider glibc, or possibly simply use iswalpha() and towlower().

Well I really don't understand the implementation of iswalpha. For example, it yields false for "FEMININE ORDINAL INDICATOR" (16#AA#) even though the definition in the database is:

00AA;FEMININE ORDINAL INDICATOR;Ll;0;L;<super> 0061;;;;N;;;;;

Here the L1 shows that this is a lower case letter,
at least that's the way
I understand the database, and thus it is allowed
in Ada identifiers. MICRO SIGN is a similar example.

At first, it looked to me like it was just testing LETTER in the
name of the symbol, but that is disproved by:

LIGATURE YIDDISH DOUBLE VAV (16#05F0#)
where the database entry is

05F0;HEBREW LIGATURE YIDDISH DOUBLE VAV;Lo;0;R;;;;;N;HEBREW LETTER DOUBLE VAV;;;;

The Lo here indicates "Letter, other", so this should also be considered
a letter and iswalpha returns True in this case.

Looks to me like I have to spin my own here :-(

or I could just use these functions and decide that discrepancies are not
that critical in these obscure cases :-)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]