This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: UCNs in identifiers
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: Geoffrey Keating <gkeating at geoffk5 dot apple dot com>
- Cc: gcc-patches at gcc dot gnu dot org
- Date: Fri, 11 Mar 2005 22:48:49 +0000 (UTC)
- Subject: Re: UCNs in identifiers
- References: <20050311082927.4E0E1188FDA@geoffk5.apple.com>
On Fri, 11 Mar 2005, Geoffrey Keating wrote:
> This patch improves my previous patch. In particular, it has
> many testcases.
It has a few testcases for a few different types of identifiers. It
doesn't seem to test various other cases of identifiers (e.g. label names,
structure element names, macro parameter names - including that in each
context duplicates are handled appropriately), or debug information
generation, or linking multiple files with external identifiers with UCNs
(whether both C, or C and C++ with extern "C" identifier), or the
equivalence of $ and \u0024, or the characters accepted for C99 and C++03
(including digits not starting identifiers in C99), or the greedy lexing
algorithm for sequences that look like incomplete UCNs, or that valid C90
code with sequences that look like UCNs is lexed as required by C90
(similar tests), or that rejection of bad UCNs in identifiers includes
identifiers that are preprocessing tokens (not just tokens), before
concatenation or stringizing, or UCNs in preprocessing numbers, or UCNs in
command-line -D and -U options. The testcases also don't have comments
explaining which particular properties of UCNs they are testing, which
isn't helpful for working out what is covered.
> It doesn't fully implement UCNs-in-IDs. The missing piece is
> stringification for C99, which as I read the C standard requires you
> to keep track of the original spelling of the token. It produces an
> error in this case, just as the current compiler does for any use of
> UCNs-in-IDs.
It looks to me like there are many missing pieces, including critical ones
for having this feature available with -std=c99 / in C++ at all. For
example, the default warning for identifiers (as preprocessing tokens at
any stage) not in NFKC doesn't seem to be implemented - having such a
warning by default was a critical feature for having any agreement that
this should be implemented at all. I don't see any commitment to the
required due diligence audits of everywhere an identifier of one kind
(input, internal, assembler, diagnostic) is converted into another kind;
for example, all the places that use %s with IDENTIFIER_POINTER which is
no longer valid when the internal format may not be suitable for
diagnostics. (The command line is an input source, just as source files
are, so the same issues apply to it as apply to source files.)
This doesn't look like a good-faith attempt to follow the checklist I
agreed with Zack in January as a compromise for how this feature could be
implemented in a way acceptable to go in GCC at all (bug 9449, comment
21). I don't consider a patch with an incomplete implementation or an
implementation with inadequate testing and auditing to be a good
contribution to GCC's implementation of C. As has been extensively
discussed in that bug, an incomplete implementation is liable to be worse
than no implementation at all.
Note that some changes (which this patch fails to make) do not depend on
UCNs in identifiers and so can go in separately, and so should go in
separately. For example, the places using %s with IDENTIFIER_POINTER for
diagnostics can be changed to use %E, which when formatting identifiers
will need to make appropriate output conversions when they are no longer
pure ASCII, or the C++ rules about converting extended characters
(including $) to UCNs in phase 1 could be implemented. In particular the
changes to use %E are not of a controversial nature and anyone could make
such changes incrementally (in the C front end, outside front ends that
format may not be available).
--
Joseph S. Myers http://www.srcf.ucam.org/~jsm28/gcc/
jsm@polyomino.org.uk (personal mail)
joseph@codesourcery.com (CodeSourcery mail)
jsm28@gcc.gnu.org (Bugzilla assignments and CCs)