This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: UCNs in identifiers


On Fri, 11 Mar 2005, Geoffrey Keating wrote:

> This patch improves my previous patch.  In particular, it has
> many testcases.

It has a few testcases for a few different types of identifiers.  It 
doesn't seem to test various other cases of identifiers (e.g. label names, 
structure element names, macro parameter names - including that in each 
context duplicates are handled appropriately), or debug information 
generation, or linking multiple files with external identifiers with UCNs 
(whether both C, or C and C++ with extern "C" identifier), or the 
equivalence of $ and \u0024, or the characters accepted for C99 and C++03 
(including digits not starting identifiers in C99), or the greedy lexing 
algorithm for sequences that look like incomplete UCNs, or that valid C90 
code with sequences that look like UCNs is lexed as required by C90 
(similar tests), or that rejection of bad UCNs in identifiers includes 
identifiers that are preprocessing tokens (not just tokens), before 
concatenation or stringizing, or UCNs in preprocessing numbers, or UCNs in 
command-line -D and -U options.  The testcases also don't have comments 
explaining which particular properties of UCNs they are testing, which 
isn't helpful for working out what is covered.

> It doesn't fully implement UCNs-in-IDs.  The missing piece is
> stringification for C99, which as I read the C standard requires you
> to keep track of the original spelling of the token.  It produces an
> error in this case, just as the current compiler does for any use of
> UCNs-in-IDs.

It looks to me like there are many missing pieces, including critical ones 
for having this feature available with -std=c99 / in C++ at all.  For 
example, the default warning for identifiers (as preprocessing tokens at 
any stage) not in NFKC doesn't seem to be implemented - having such a 
warning by default was a critical feature for having any agreement that 
this should be implemented at all.  I don't see any commitment to the 
required due diligence audits of everywhere an identifier of one kind 
(input, internal, assembler, diagnostic) is converted into another kind; 
for example, all the places that use %s with IDENTIFIER_POINTER which is 
no longer valid when the internal format may not be suitable for 
diagnostics.  (The command line is an input source, just as source files 
are, so the same issues apply to it as apply to source files.)

This doesn't look like a good-faith attempt to follow the checklist I 
agreed with Zack in January as a compromise for how this feature could be 
implemented in a way acceptable to go in GCC at all (bug 9449, comment 
21).  I don't consider a patch with an incomplete implementation or an 
implementation with inadequate testing and auditing to be a good 
contribution to GCC's implementation of C.  As has been extensively 
discussed in that bug, an incomplete implementation is liable to be worse 
than no implementation at all.

Note that some changes (which this patch fails to make) do not depend on 
UCNs in identifiers and so can go in separately, and so should go in 
separately.  For example, the places using %s with IDENTIFIER_POINTER for 
diagnostics can be changed to use %E, which when formatting identifiers 
will need to make appropriate output conversions when they are no longer 
pure ASCII, or the C++ rules about converting extended characters 
(including $) to UCNs in phase 1 could be implemented.  In particular the 
changes to use %E are not of a controversial nature and anyone could make 
such changes incrementally (in the C front end, outside front ends that 
format may not be available).

-- 
Joseph S. Myers               http://www.srcf.ucam.org/~jsm28/gcc/
    jsm@polyomino.org.uk (personal mail)
    joseph@codesourcery.com (CodeSourcery mail)
    jsm28@gcc.gnu.org (Bugzilla assignments and CCs)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]