This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
Re: Implementing Universal Character Names in identifiers
- From: "Joseph S. Myers" <jsm28 at cam dot ac dot uk>
- To: Zack Weinberg <zack at codesourcery dot com>
- Cc: Martin v. Löwis <loewis at informatik dot hu-berlin dot de>, <gcc-patches at gcc dot gnu dot org>, <java at gcc dot gnu dot org>
- Date: Mon, 28 Oct 2002 18:53:24 +0000 (GMT)
- Subject: Re: Implementing Universal Character Names in identifiers
On Mon, 28 Oct 2002, Zack Weinberg wrote:
> What you wrote in response to this is interesting but doesn't address
> the issue of Unicode normalization of identifiers. It sounds more
> like an extended discussion of the previous point. I'm talking about
> the process described in UAX 15 (http://www.unicode.org/unicode/reports/tr15/)
> and in particular annex 7 of that document ("Programming Language
> Identifiers").
I don't think there's anything in the language standards to permit
normalization to NFC as described there. (It could be done in "phase 0"
for UTF-8 in the input file, like we ignore whitespace at end of line, but
not for UCNs. And do we really want to build in the large character
tables required for normalization?)
> - In cpplib, provide routines that validate individual identifiers
> against the precise lists in C99 and C++98.
>
> - GCC enforces the precise lists in C99 and C++98 only in -pedantic
> mode.
There's still the typo in the C++98 list that's a recognised Defect that
should be corrected (following existing practice of implementing
resolutions to Defect Reports before they make it into a TC). But
non-pedantic should use the current Unicode ranges of identifier
characters for both languages.
--
Joseph S. Myers
jsm28@cam.ac.uk