This is the mail archive of the
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
>>>>> "Robert" == Robert Dewar <email@example.com> writes:
Robert> Paul Koning wrote:
>> Then take i, which upcases to I with dot. Turkish has i with and
>> without dot, and the dot is preserved when you change case (in
>> either direction).
Robert> Yes, and that's fine, both lower case i with dot and lower
Robert> case i without dot fold upper case to capital I (without
Robert> dot), and so all three are equivalent in identifiers.
That's wrong for Turkish.
Robert> There is no upper case I with dot, so I have no idea what you
Robert> mean by saying the dot is preserved. The three characters in
Robert> question are:
Robert> 0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069;
Robert> 0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
Robert> 0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;
There certainly is such a thing as uppercase I with a dot, that's a
standard part of Turkish. For an example, see
section ("Ingiltere Vizeleri:" but with a dot on the first letter).
I see in the character list this entry:
I WITH DOT ABOVE, LATIN CAPITAL LETTER 0130
That sounds like the one.