This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: UCNs-in-IDs patch
On Tue, 15 Mar 2005, Geoffrey Keating wrote:
> Could you quote a part of the standard which says that \u00c1 and
> \u00C1 count as a "different expansion" (or, in standardese, that they
> have "different spelling")? I couldn't find any definition of the
> word 'spelling' at all, but maybe I missed it.
>
> Google's dictionary says that "spelling" means "the forming of words
> with letters in an accepted order". I would not consider \ to be a
> letter, but "\u00c1" is a (string containing a) letter.
I consider it obvious that spelling in the standard refers to the sequence
of source characters. The fact that %: and # are different spellings
(explicitly stated in 6.4.6) but otherwise equivalent is exactly the same
as the fact that there are different spellings of what is otherwise the
same identifier: multiple sequences of source characters that are
differently stringized and different in macro expansions but otherwise are
the same semantically (once converted from preprocessing tokens to
tokens). If there is a definition in ISO/IEC 2382-1:1993, it would only
be relevant if consistent with the references to spelling of
non-alphanumeric tokens.
Such questions are matters to raise with the WG14 reflector when found in
the course of implementation before committing changes, *not* after doing
the work, if you think there is doubt. If there is not a consensus on the
reflector as to the clear meaning of the standard, they are matters for
DRs.
> In any case, translation phase 1 begins with an implementation-defined
> mapping; and such mapping can choose to implement model A or C (but
> the implementation must specify it).
>
> Since users can tell the difference between the three models only in
> obscure corner cases, which the standard tried to make undefined
> anyway, I think it's fine to say that we're doing model C.
I don't consider any model which doesn't allow all valid sequences of
preprocessing token spellings to be a sensible model to choose. Models
which don't permit all C programs could be done, but they aren't what we
document and I don't believe they make sense. Changes to the documented
implementation-defined behavior need especially careful discussion,
agreement and design in advance of implementation. Especially, models are
not something to choose in the middle of implementation.
What the standard "tries" to make undefined behavior seems irrelevant.
If something is undefined, the established and previously discussed cpplib
practice is that it is a hard error. If it is not, it must be handled as
required by the standards. Imperfect implementations defeat the object of
showing up the problems with this feature for future standard versions.
If there is a standard defect, submit DRs. If it is still meaningfully
implementable, implement the standard requirements pending the DR
resolution, and make it clear in comments in the code and the testcases
what the issue is.
I have submitted three DRs to WG14 in preparation for implementing C99 VLA
requirements. Two more are in the pipeline and more may yet arise. I
expect to propose to the GCC lists detailed specifications for how this
interacts with GNU extensions and how the unspecified areas and areas in
the DRs will be handled in implementation, and to prepare accompanying
testcases, before actually writing the code.
> [lex.phases] paragraph 1 says:
>
> An implementation may use any internal encoding, so long as an actual
> extended character encountered in the source file, and the same
> extended character expressed in the source file as a
> universal-character-name (i.e. using the \uXXXX notation), are handled
> equivalently.
>
> I believe this is specifically intended to allow implementations to
> use UTF-8 (or other encoding) as an internal encoding for identifiers,
> and so when [cpp.stringize] says "the original spelling" it means in
> the internal encoding, not as the user wrote it.
We discussed this before - it seems to be a restatement of the as-if rule,
nothing more <http://gcc.gnu.org/ml/gcc-patches/2003-04/msg01528.html>.
I thought you knew
<http://gcc.gnu.org/ml/gcc-patches/2003-04/msg01509.html> that spellings
should be preserved for a good implementation and must be preserved if you
don't use dodgy phase 1 models.
> Alternatively, phase 1 starts with the same mapping as for C, and so the
> comment from the C rationale applies for C++ too.
The comment from the C rationale does not apply for C++. The UCN models
are different and you can't infer the intent of C++ from what the C
rationale says. (In fact I generally consider the C Rationale as fairly
useless as it generally fails to touch on the subtle issues of
interpretation.) Again, certain phase 1 models that don't allow all valid
token spelling sequences are (a) bad models, (b) not what's documented and
(c) not what the longstanding documented intent (project/cpplib.html,
etc.) for handling character sets is.
Design - public agreed design, see contributewhy.html - should precede
code. So should documentation of the agreed models, and testcases as far
as possible. The impression given here is that the translation model and
the interpretations of grey areas are being made up now rather than being
thought out in advance.
I should point out that there is a mood on IRC that there was a serious
mistake in the way that the design of PCH was done in private without
sufficient public discussion of design approaches and agreement of the
right way, and that IMA had similar problems, and that this is another
instance - at least the third - of exactly the same problem when the
previous cases should have been learnt from.
[03/07/05 12:04] <akibahara> stevenb: Hey, I never agreed with our PCH.
[03/07/05 12:05] <akibahara> It was disgust at PCH and IMA-like things getting in that made me decide to do my own thing.
We have - for a reason, that they are essential to cooperative GCC
development instead of anarchy - development processes based around
agreement, consensus and being especially cautious about anything risky or
controversial. Most of us take care to get prior approval from relevant
maintainers if there is any doubt about the merits or approach of a patch
even where we have write access to the relevant parts of the compiler; you
should have seen plenty of instances of global write maintainers posting
patches for comment and revising them until there is agreement, and some
cases where global write maintainers have patches for particular issues
which have gone uncommitted for *years* because of disagreements from the
relevant maintainers of parts of the compiler. Most of us also sometimes
make changes or do testing which personally we think are unnecessary in
order to get that agreement on the merits of changes. For example, I
certainly did not presume to override other maintainer objections to my
fix to bug 13801 on the basis that it was a C front end patch; I did the
various GCC and GDB testing and benchmarking requested in order to show
the absence for a wide range of code of the types of problems it was
suggested might occur; I knew the patch might be controversial so did not
commit it until objections had been properly resolved.
--
Joseph S. Myers http://www.srcf.ucam.org/~jsm28/gcc/
jsm@polyomino.org.uk (personal mail)
joseph@codesourcery.com (CodeSourcery mail)
jsm28@gcc.gnu.org (Bugzilla assignments and CCs)