This is the mail archive of the
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
>>>>> "Robert" == Robert Dewar <firstname.lastname@example.org> writes:
Robert> Paul Koning wrote:
>> But that is nowhere near sufficient. The issue is that case
>> folding rules are different for different languages/locales that
>> use the SAME character set. For example, there are a whole bunch
>> of different folding rules for Latin-1.
Robert> Well in practice the folding rules for Latin-1 have been part
Robert> of the standard for ten years, so they are not about to
Robert> It would be interesting to know an example of what you state
Uppercase letters aren't accented in France, but they are in Quebec.
(That doesn't affect folding to lowercase, of course, but it does
affect case-insensitive equality).
An example that affects folding to lowercase: I folds to i-without-dot
in Turkish. Those aren't in Latin-1, but they are in the Latin
section of 10646.
Robert> The decision in Ada is that you do not want the meaning of a
Robert> program or its legality to change in a locale dependent
Robert> way. This is really a fundamental starting point. Note that
Robert> this is a radically different issue from folding at run-time
Robert> in a manner that makes sense to an application program.
Ok, fair enough, I was thinking more of the runtime case in my