This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: UCNs-in-IDs patch
On Thu, 17 Mar 2005, Zack Weinberg wrote:
> I do not have a strong opinion on how these "spelling" properties
> should work, but would like to point out that preserving the exact
> spelling of identifiers is going to be a huge implementation headache
> and I don't see that it's worth the trouble.
I still consider preserving spelling to be a simple matter of correctness
and quality of implementation; UCNs allow different spellings of the same
identifier just as digraphs allow different spellings of # and ##. We
have a bug in that regard (bug 20078); we have now such a bug with
identifiers. Some bugs are harder to fix than others, and some are less
important than others, but we still have a bug (subject to any comments
from WG14) which should be a dependency of bug 16989 (and any C++03
tracker bug, but I don't think we have one) until fixed (likewise every
other unimplemented checklist item). Spelling is relevant for
preprocessed output - which is beyond the scope of the standard but is
certainly not beyond the scope of GCC - and it seems obvious that
preprocessed output should not change one UCN into another. Yes, UCNs in
identifiers preserving spelling break the invariant that identifiers can
be distinguished by their hash table entries, just as they break
invariants about what can appear in identifiers elsewhere in the compiler,
so implementing spelling preservation, perhaps by having second-class hash
table entries for other spellings so different tokens for the same
identifier can have different spellings, would need its own audit.
> More importantly, the conversation is presently debating minutiae at
> the expense of larger concerns. My primary concern with Geoff's
(Because minutiae are the easiest way to demonstrate that any given bug is
in fact a regression if you want it to be one - something I'd more
normally do to justify a bug-fix going on a release branch when the
original bug it aimed to fix wasn't obviously a regression - the need for
an audit is less clearly demonstrable to be such without doing the audit
oneself.)
> patches as they stand, is that he has not audited the compiler for
> places where code assumes that identifiers consist exclusively of
> characters from the set A-Za-z0-9_$. Of particular concern are:
> Symbol mangling in backends; diagnostic output; debugging output;
> C++ name mangling.
I believe that "every use of IDENTIFIER_POINTER in the compiler (except
non-C-family front ends)" is a good indication of what needs auditing
here.
As regards debugging output, I noted that some of the testcases were a
useful start but should go in gcc.dg/debug so they test that as well,
providing a useful sanity check but not substituting for the full audit.
--
Joseph S. Myers http://www.srcf.ucam.org/~jsm28/gcc/
jsm@polyomino.org.uk (personal mail)
joseph@codesourcery.com (CodeSourcery mail)
jsm28@gcc.gnu.org (Bugzilla assignments and CCs)