This is the mail archive of the
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
Robert Dewar wrote:
> Paul Koning wrote:
>> Then take i, which upcases to I with dot. Turkish has i with and
>> without dot, and the dot is preserved when you change case (in either
> Yes, and that's fine, both lower case i with dot and lower case i
> without dot fold upper case to capital I (without dot), and so all three
> are equivalent in identifiers.
> There is no upper case I with dot, so I have no idea what you mean by
> saying the dot is preserved. The three characters in question are:
> 0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069;
> 0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
> 0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;
Many others have already corrected you on the existance of Ä, so I won't
bother. But I will point out the existence of a very handy program (it's
pacakged as native in debian, so presumably there's no other upstream
source), available from
It does all kinds of simple property lookups and has saved me more time when
dealing with unicode issues...