This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Query on UTF-32 encodings for letters


>>>>> "Robert" == Robert Dewar <dewar@adacore.com> writes:

 Robert> Paul Koning wrote:
 >> Then take i, which upcases to I with dot.  Turkish has i with and
 >> without dot, and the dot is preserved when you change case (in
 >> either direction).

 Robert> Yes, and that's fine, both lower case i with dot and lower
 Robert> case i without dot fold upper case to capital I (without
 Robert> dot), and so all three are equivalent in identifiers.

That's wrong for Turkish.

 Robert> There is no upper case I with dot, so I have no idea what you
 Robert> mean by saying the dot is preserved. The three characters in
 Robert> question are:

 Robert> 0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069;
 Robert> 0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
 Robert> 0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;

There certainly is such a thing as uppercase I with a dot, that's a
standard part of Turkish.  For an example, see
http://www.turkishembassy.org/consularservices/duyurular.htm, second
section ("Ingiltere Vizeleri:" but with a dot on the first letter).

I see in the character list this entry:

  I WITH DOT ABOVE, LATIN CAPITAL LETTER	0130

That sounds like the one.

       paul


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]