This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Query on UTF-32 encodings for letters

Robert Dewar wrote:

> Paul Koning wrote:
>> Then take i, which upcases to I with dot.  Turkish has i with and
>> without dot, and the dot is preserved when you change case (in either
>> direction).
> Yes, and that's fine, both lower case i with dot and lower case i
> without dot fold upper case to capital I (without dot), and so all three
> are equivalent in identifiers.
> There is no upper case I with dot, so I have no idea what you mean by
> saying the dot is preserved. The three characters in question are:
> 0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069;
> 0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049
> 0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;

Many others have already corrected you on the existance of Ä, so I won't
bother. But I will point out the existence of a very handy program (it's
pacakged as native in debian, so presumably there's no other upstream
source), available from

It does all kinds of simple property lookups and has saved me more time when
dealing with unicode issues...

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]