This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug preprocessor/9449] UCNs not recognized in identifiers (c++/c99)
- From: "geoffk at geoffk dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 16 Sep 2005 00:02:03 -0000
- Subject: [Bug preprocessor/9449] UCNs not recognized in identifiers (c++/c99)
- References: <20030127145600.9449.rearnsha@arm.com>
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
------- Additional Comments From geoffk at geoffk dot org 2005-09-16 00:01 -------
Subject: Re: UCNs not recognized in identifiers (c++/c99)
On 15/09/2005, at 3:53 PM, joseph at codesourcery dot com wrote:
> Yes, "spelling" is meant in terms of the source code characters.
> The idea is to permit simple strcmp-like checking by the
> preprocessor.
Good, so that answers that question.
You raise a good point about GCC not having documentation for phase
1. I don't have time to write all of it, but I think I can write the
last part, about UCNs, so maybe together we can get it all done. My
proposed wording is:
@cite{The mapping between physical source file multibyte characters
and the source character set in translation phase 1 (C90 and C99
5.1.1.2).}
[CR/NL/CR-NL are turned into EOL markers, spaces are deleted between
backslash and the end of a line, it's converted to UTF-8 using iconv
based on -finput-charset---and what else?]
Then, any character sequence which would form a UCN in an identifier
in phase 3 of translation is converted into the corresponding UTF-8
sequence. Any backslash-newline combinations in the UCN are
preserved and placed after the UTF-8 sequence.
[note that there's no way for a user to tell whether a backslash-
newline combination is placed before, in the middle of, or after, the
UTF-8 sequence.]
...
@cite{Which additional multibyte characters may appear in identifiers
and their correspondence to universal character names (C99 6.4.2).}
UTF-8 character sequences may appear in identifiers, and they
correspond to the UCN that specifies that character. A UTF-8
sequence may appear only if the UCN that it corresponds to would be
permitted in the identifier at that point. At present, only those
UTF-8 sequences which were produced by the mapping from UCNs to UTF-8
sequences in translation phase 1 are permitted, but this is likely to
change in the future.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449