This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Query on UTF-32 encodings for letters


>>>>> "Robert" == Robert Dewar <dewar@adacore.com> writes:

 Robert> Paul Koning wrote: Joseph S. Myers wrote:
 >> >> Proper case folding and caseless matching are locale-dependent.
 >> 
 Robert> That's not true for the Ada 2005 rules, which are locale
 Robert> independent and driven only by the 10646 database.
 >> Then that simply means that Ada has either created a locale of its
 >> own, or adopted one specific locale to be the one it uses.
 >> Anglocentrism at work, perhaps?

 Robert> I don't think that is the case, with the full 10646 database,
 Robert> every character in the database is properly categorized, and
 Robert> the whole point of Wide_Wide_Character in Ada is to match the
 Robert> 10646 standard exactly. That is what ISO mandates, so it is
 Robert> hardly a matter of Anglocentrism (note that any reference to
 Robert> Unicode as a standard *is* Anglocentric :-) We are driven by
 Robert> ISO 10646, not Unicode. Luckily these are essentially
 Robert> completely aligned at this stage.

 Robert> Note that in 10646, there is a lot of distinction between
 Robert> different national characters. For instance, the Greek upper
 Robert> case alpha is typographically identical to latin upper case
 Robert> A, but they occupy distinct code positions. That means that
 Robert> the folding rule for every character is part of the
 Robert> non-locale dependent database.

But that is nowhere near sufficient.  The issue is that case folding
rules are different for different languages/locales that use the SAME
character set.  For example, there are a whole bunch of different
folding rules for Latin-1.

If 10646 defines a single set of rules, then it's part of the problem,
not part of the solution. 

    paul


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]