This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
- From: Robert Dewar <dewar at adacore dot com>
- To: Geoffrey Keating <geoffk at geoffk dot org>
- Cc: gcc at gcc dot gnu dot org
- Date: Sat, 15 Jan 2005 00:26:28 -0500
- Subject: Re: Query on UTF-32 encodings for letters
- References: <41E3E28D.6050506@adacore.com> <m2brbuyx36.fsf@greed.local>
Geoffrey Keating wrote:
Robert Dewar <dewar@adacore.com> writes:
Ada 2005 requires full support for all planes of UTF-32
encoding, including the use of letters in identifiers,
including also proper upper lower case equivalence.
You might consider glibc, or possibly simply use iswalpha() and towlower().
Well I really don't understand the implementation of iswalpha. For
example, it yields false for "FEMININE ORDINAL INDICATOR" (16#AA#)
even though the definition in the database is:
00AA;FEMININE ORDINAL INDICATOR;Ll;0;L;<super> 0061;;;;N;;;;;
Here the L1 shows that this is a lower case letter,
at least that's the way
I understand the database, and thus it is allowed
in Ada identifiers. MICRO SIGN is a similar example.
At first, it looked to me like it was just testing LETTER in the
name of the symbol, but that is disproved by:
LIGATURE YIDDISH DOUBLE VAV (16#05F0#)
where the database entry is
05F0;HEBREW LIGATURE YIDDISH DOUBLE VAV;Lo;0;R;;;;;N;HEBREW LETTER DOUBLE VAV;;;;
The Lo here indicates "Letter, other", so this should also be considered
a letter and iswalpha returns True in this case.
Looks to me like I have to spin my own here :-(
or I could just use these functions and decide that discrepancies are not
that critical in these obscure cases :-)