This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Query on UTF-32 encodings for letters


Robert Dewar wrote:

> Paul Koning wrote:
> 
>> Then take i, which upcases to I with dot.  Turkish has i with and
>> without dot, and the dot is preserved when you change case (in either
>> direction).
> 
> Yes, and that's fine, both lower case i with dot and lower case i
> without dot fold upper case to capital I (without dot), and so all three
> are equivalent in identifiers.
> 
> There is no upper case I with dot, so I have no idea what you mean by
> saying the dot is preserved. The three characters in question are:
> 
> 0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069;
> 0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
> 0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;

Many others have already corrected you on the existance of Ä, so I won't
bother. But I will point out the existence of a very handy program (it's
pacakged as native in debian, so presumably there's no other upstream
source), available from
http://ftp.debian.org/debian/pool/main/u/unicode/unicode_0.4.6.tar.gz

It does all kinds of simple property lookups and has saved me more time when
dealing with unicode issues...


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]