This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: Query on UTF-32 encodings for letters
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: Robert Dewar <dewar at adacore dot com>
- Cc: gcc at gcc dot gnu dot org
- Date: Sun, 16 Jan 2005 19:52:49 +0000 (UTC)
- Subject: Re: Query on UTF-32 encodings for letters
- References: <41E3E28D.6050506@adacore.com>
On Tue, 11 Jan 2005, Robert Dewar wrote:
> Ada 2005 requires full support for all planes of UTF-32
> encoding, including the use of letters in identifiers,
> including also proper upper lower case equivalence.
>
> All this information is obtainable from the 10646 standard,
> but it is non-trivial to generate the predicates Is_Letter,
> and the function To_Lower.
Proper case folding and caseless matching are locale-dependent. Case
conversion can also depend on context in a word as well as on locale. In
Unicode there is titlecase as well as uppercase and lowercase. I presume
there is in fact a more precise specification, with appropriate normative
references, of what exactly is required and whether there is to be
locale-dependence, at compile time or at runtime.
Although the Unicode Character Database includes various tables for case
mapping, including context and locale dependent mapping, I'm not sure
whether these are normative or informative; section 4.2 of the Unicode
Standard version 4.0 refers to them as normative, while section 5.18 says
that case itself is normative but the mappings are informative: but the
whole of chapter 5 is not normative.
--
Joseph S. Myers http://www.srcf.ucam.org/~jsm28/gcc/
jsm@polyomino.org.uk (personal mail)
joseph@codesourcery.com (CodeSourcery mail)
jsm28@gcc.gnu.org (Bugzilla assignments and CCs)