This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Implementing Universal Character Names in identifiers


On Mon, 28 Oct 2002, Zack Weinberg wrote:

> What you wrote in response to this is interesting but doesn't address
> the issue of Unicode normalization of identifiers.  It sounds more
> like an extended discussion of the previous point.  I'm talking about
> the process described in UAX 15 (http://www.unicode.org/unicode/reports/tr15/)
> and in particular annex 7 of that document ("Programming Language
> Identifiers").

I don't think there's anything in the language standards to permit
normalization to NFC as described there.  (It could be done in "phase 0"  
for UTF-8 in the input file, like we ignore whitespace at end of line, but
not for UCNs.  And do we really want to build in the large character
tables required for normalization?)

>  - In cpplib, provide routines that validate individual identifiers
>    against the precise lists in C99 and C++98.
> 
>  - GCC enforces the precise lists in C99 and C++98 only in -pedantic
>    mode.

There's still the typo in the C++98 list that's a recognised Defect that
should be corrected (following existing practice of implementing
resolutions to Defect Reports before they make it into a TC).  But 
non-pedantic should use the current Unicode ranges of identifier 
characters for both languages.

-- 
Joseph S. Myers
jsm28@cam.ac.uk


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]