This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug preprocessor/9449] UCNs not recognized in identifiers (c++/c99)

From: "geoffk at geoffk dot org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 16 Sep 2005 00:02:03 -0000
Subject: [Bug preprocessor/9449] UCNs not recognized in identifiers (c++/c99)
References: <20030127145600.9449.rearnsha@arm.com>
Reply-to: gcc-bugzilla at gcc dot gnu dot org

------- Additional Comments From geoffk at geoffk dot org  2005-09-16 00:01 -------
Subject: Re:  UCNs not recognized in identifiers (c++/c99)

On 15/09/2005, at 3:53 PM, joseph at codesourcery dot com wrote:

>   Yes, "spelling" is meant in terms of the source code characters.
>   The idea is to permit simple strcmp-like checking by the  
> preprocessor.

Good, so that answers that question.

You raise a good point about GCC not having documentation for phase  
1.  I don't have time to write all of it, but I think I can write the  
last part, about UCNs, so maybe together we can get it all done.  My  
proposed wording is:

@cite{The mapping between physical source file multibyte characters
and the source character set in translation phase 1 (C90 and C99  
5.1.1.2).}

[CR/NL/CR-NL are turned into EOL markers, spaces are deleted between  
backslash and the end of a line, it's converted to UTF-8 using iconv  
based on -finput-charset---and what else?]

Then, any character sequence which would form a UCN in an identifier  
in phase 3 of translation is converted into the corresponding UTF-8  
sequence.  Any backslash-newline combinations in the UCN are  
preserved and placed after the UTF-8 sequence.

[note that there's no way for a user to tell whether a backslash- 
newline combination is placed before, in the middle of, or after, the  
UTF-8 sequence.]

...

@cite{Which additional multibyte characters may appear in identifiers
and their correspondence to universal character names (C99 6.4.2).}

UTF-8 character sequences may appear in identifiers, and they  
correspond to the UCN that specifies that character.  A UTF-8  
sequence may appear only if the UCN that it corresponds to would be  
permitted in the identifier at that point.  At present, only those  
UTF-8 sequences which were produced by the mapping from UCNs to UTF-8  
sequences in translation phase 1 are permitted, but this is likely to  
change in the future.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]