This is the mail archive of the
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
Joseph S. Myers wrote:
On Tue, 11 Jan 2005, Robert Dewar wrote:
Ada 2005 requires full support for all planes of UTF-32
encoding, including the use of letters in identifiers,
including also proper upper lower case equivalence.
All this information is obtainable from the 10646 standard,
but it is non-trivial to generate the predicates Is_Letter,
and the function To_Lower.
Proper case folding and caseless matching are locale-dependent.
That's not true for the Ada 2005 rules, which are locale independent
and driven only by the 10646 database.
conversion can also depend on context in a word as well as on locale. In
Unicode there is titlecase as well as uppercase and lowercase.
title case is allowed in Ada 2005 identifiers.
The full documentation for what the Ada 2005 AI requires can be found in
there is in fact a more precise specification, with appropriate normative
references, of what exactly is required and whether there is to be
locale-dependence, at compile time or at runtime.
Indeed, the quoted AI is the precise specification
Although the Unicode Character Database includes various tables for case
mapping, including context and locale dependent mapping, I'm not sure
whether these are normative or informative; section 4.2 of the Unicode
Standard version 4.0 refers to them as normative, while section 5.18 says
that case itself is normative but the mappings are informative: but the
whole of chapter 5 is not normative.
Well the Ada rules as stated are indeed normative and are based on the
unicode categorization. But Ada does not follow all the Unicode
recommendations. In particular, it does not mandate Normalization
Form KC, and instead follows the C# style of only rigourously
defining the effect of programs which are already in this
normalization form. Furthermore, Ada decided not to use
ISO/IEC TR 10176 which would be the assumed approach. The
reasons for this are discussed in the AI.
Anyway, it seems not too hard to write specific Is_Letter and
Fold_To_Upper_Case following the rules in this AI.
At this stage, I have pretty much concluded that I should spin my own
version of these routines to exactly match the Ada spec.
Thanks Joseph for your comments!
(this character stuff is a bottomless pit :-)