This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Tue, 15 Mar 2005, Geoffrey Keating wrote:
Could you quote a part of the standard which says that \u00c1 and \u00C1 count as a "different expansion" (or, in standardese, that they have "different spelling")? I couldn't find any definition of the word 'spelling' at all, but maybe I missed it.
Google's dictionary says that "spelling" means "the forming of words with letters in an accepted order". I would not consider \ to be a letter, but "\u00c1" is a (string containing a) letter.
I consider it obvious that spelling in the standard refers to the sequence
of source characters.
#define foo ba\ r #define foo bar
#define foo ??' #define foo ^
The fact that %: and # are different spellings
(explicitly stated in 6.4.6) but otherwise equivalent is exactly the same
as the fact that there are different spellings of what is otherwise the
same identifier: multiple sequences of source characters that are
differently stringized and different in macro expansions but otherwise are
the same semantically (once converted from preprocessing tokens to
tokens). If there is a definition in ISO/IEC 2382-1:1993, it would only
be relevant if consistent with the references to spelling of
non-alphanumeric tokens.
Such questions are matters to raise with the WG14 reflector when found in
the course of implementation before committing changes, *not* after doing
the work, if you think there is doubt. If there is not a consensus on the
reflector as to the clear meaning of the standard, they are matters for
DRs.
In any case, translation phase 1 begins with an implementation-defined
mapping; and such mapping can choose to implement model A or C (but
the implementation must specify it).
Since users can tell the difference between the three models only in obscure corner cases, which the standard tried to make undefined anyway, I think it's fine to say that we're doing model C.
I don't consider any model which doesn't allow all valid sequences of
preprocessing token spellings to be a sensible model to choose. Models
which don't permit all C programs could be done, but they aren't what we
document and I don't believe they make sense.
Changes to the documented
implementation-defined behavior need especially careful discussion,
agreement and design in advance of implementation. Especially, models are
not something to choose in the middle of implementation.
What the standard "tries" to make undefined behavior seems irrelevant.
If something is undefined, the established and previously discussed cpplib
practice is that it is a hard error. If it is not, it must be handled as
required by the standards. Imperfect implementations defeat the object of
showing up the problems with this feature for future standard versions.
[lex.phases] paragraph 1 says:
An implementation may use any internal encoding, so long as an actual
extended character encountered in the source file, and the same
extended character expressed in the source file as a
universal-character-name (i.e. using the \uXXXX notation), are handled
equivalently.
I believe this is specifically intended to allow implementations to use UTF-8 (or other encoding) as an internal encoding for identifiers, and so when [cpp.stringize] says "the original spelling" it means in the internal encoding, not as the user wrote it.
We discussed this before - it seems to be a restatement of the as-if rule,
nothing more <http://gcc.gnu.org/ml/gcc-patches/2003-04/msg01528.html>.
I thought you knew
<http://gcc.gnu.org/ml/gcc-patches/2003-04/msg01509.html> that spellings
should be preserved for a good implementation and must be preserved if you
don't use dodgy phase 1 models.
Alternatively, phase 1 starts with the same mapping as for C, and so the
comment from the C rationale applies for C++ too.
The comment from the C rationale does not apply for C++.
I should point out that there is a mood on IRC that there was a serious mistake in the way that the design of PCH was done in private without sufficient public discussion of design approaches and agreement of the right way, and that IMA had similar problems, and that this is another instance - at least the third - of exactly the same problem when the previous cases should have been learnt from.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |