This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
>>>>> "Robert" == Robert Dewar <dewar@adacore.com> writes:
Robert> Paul Koning wrote: Joseph S. Myers wrote:
>> >> Proper case folding and caseless matching are locale-dependent.
>>
Robert> That's not true for the Ada 2005 rules, which are locale
Robert> independent and driven only by the 10646 database.
>> Then that simply means that Ada has either created a locale of its
>> own, or adopted one specific locale to be the one it uses.
>> Anglocentrism at work, perhaps?
Robert> I don't think that is the case, with the full 10646 database,
Robert> every character in the database is properly categorized, and
Robert> the whole point of Wide_Wide_Character in Ada is to match the
Robert> 10646 standard exactly. That is what ISO mandates, so it is
Robert> hardly a matter of Anglocentrism (note that any reference to
Robert> Unicode as a standard *is* Anglocentric :-) We are driven by
Robert> ISO 10646, not Unicode. Luckily these are essentially
Robert> completely aligned at this stage.
Robert> Note that in 10646, there is a lot of distinction between
Robert> different national characters. For instance, the Greek upper
Robert> case alpha is typographically identical to latin upper case
Robert> A, but they occupy distinct code positions. That means that
Robert> the folding rule for every character is part of the
Robert> non-locale dependent database.
But that is nowhere near sufficient. The issue is that case folding
rules are different for different languages/locales that use the SAME
character set. For example, there are a whole bunch of different
folding rules for Latin-1.
If 10646 defines a single set of rules, then it's part of the problem,
not part of the solution.
paul