This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
Robert Dewar wrote:
> Paul Koning wrote:
>
>> Then take i, which upcases to I with dot. Turkish has i with and
>> without dot, and the dot is preserved when you change case (in either
>> direction).
>
> Yes, and that's fine, both lower case i with dot and lower case i
> without dot fold upper case to capital I (without dot), and so all three
> are equivalent in identifiers.
>
> There is no upper case I with dot, so I have no idea what you mean by
> saying the dot is preserved. The three characters in question are:
>
> 0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069;
> 0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
> 0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;
Many others have already corrected you on the existance of Ä, so I won't
bother. But I will point out the existence of a very handy program (it's
pacakged as native in debian, so presumably there's no other upstream
source), available from
http://ftp.debian.org/debian/pool/main/u/unicode/unicode_0.4.6.tar.gz
It does all kinds of simple property lookups and has saved me more time when
dealing with unicode issues...