This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
>>>>> "Robert" == Robert Dewar <dewar@adacore.com> writes:
>> Uppercase letters aren't accented in France
Robert> That is a (very commonly held) myth.
Interesting. Learn something new every day.
>> An example that affects folding to lowercase: I folds to
>> i-without-dot in Turkish. Those aren't in Latin-1, but they are
>> in the Latin section of 10646.
Robert> Yes, but for Ada, we can consider identifier matching to be
Robert> only in the mode of folding to upper case, which takes care
Robert> of the dotless i since this folds to upper case I.
Then take i, which upcases to I with dot. Turkish has i with and
without dot, and the dot is preserved when you change case (in either
direction).
Would you map eszet (in German) to ss? Or to sz? Or neither? Modern
usage does the former; 1930-ish usage the latter.
paul