This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
>>>>> "Robert" == Robert Dewar <dewar@adacore.com> writes:
Robert> Paul Koning wrote:
>> But that is nowhere near sufficient. The issue is that case
>> folding rules are different for different languages/locales that
>> use the SAME character set. For example, there are a whole bunch
>> of different folding rules for Latin-1.
Robert> Well in practice the folding rules for Latin-1 have been part
Robert> of the standard for ten years, so they are not about to
Robert> change.
Robert> It would be interesting to know an example of what you state
Robert> above.
Uppercase letters aren't accented in France, but they are in Quebec.
(That doesn't affect folding to lowercase, of course, but it does
affect case-insensitive equality).
An example that affects folding to lowercase: I folds to i-without-dot
in Turkish. Those aren't in Latin-1, but they are in the Latin
section of 10646.
Robert> The decision in Ada is that you do not want the meaning of a
Robert> program or its legality to change in a locale dependent
Robert> way. This is really a fundamental starting point. Note that
Robert> this is a radically different issue from folding at run-time
Robert> in a manner that makes sense to an application program.
Ok, fair enough, I was thinking more of the runtime case in my
comments.
paul