This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug preprocessor/9449] UCNs not recognized in identifiers (c++/c99)

From: "joseph at codesourcery dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 7 Jan 2005 10:27:55 -0000
Subject: [Bug preprocessor/9449] UCNs not recognized in identifiers (c++/c99)
References: <20030127145600.9449.rearnsha@arm.com>
Reply-to: gcc-bugzilla at gcc dot gnu dot org

------- Additional Comments From joseph at codesourcery dot com  2005-01-07 10:27 -------
Subject: Re:  UCNs not recognized in identifiers
 (c++/c99)

On Fri, 7 Jan 2005, zack at gcc dot gnu dot org wrote:

> An obvious rebuttal to this is that the compiler used in step 4 is broken.  As
> you say, the C standard references ISO10646 not Unicode and the concept of
> normalization does not exist in ISO10646, and this could be taken to imply that
> no normalization shall occur.  However, there is no unambiguous statement to
> that effect in the standard, and there is strong quality-of-implementation

I think the relevant text is that treating identifiers as sequences of 
characters and UCNs denoting single characters.

I've had no on-list response yet to the query about this I sent to the 
WG14 reflector on Tuesday (reflector message 10698), with the HEBREW 
LETTER SHIN WITH DAGESH AND SHIN DOT examples.

> pressure in the opposite direction.  Put aside the standard for a moment: are
> users going to like a compiler that insists that "Å" (U+00C5) and "&#8491;" (U+212B)
> are not the same character?  [It happens that on my screen those are ever so
> slightly different, but that's just luck - and X11 will only let me type U+00C5;
> I resorted to hex-editing to get the other.]

The question of appearance is the same as that for U+0041 LATIN CAPITAL 
LETTER A, U+0391 GREEK CAPITAL LETTER ALPHA, U+0410 CYRILLIC CAPITAL 
LETTER A.  Will users like such a compiler less than one which doesn't 
allow them to use their native language in identifiers at all?

> normalization, as a defensive measure against such external changes.  
> You could argue that this is just another way for C programmers to shoot 
> themselves in the foot, but I don't think the myriad ways that already 
> exist are a reason to add more.

(It's WG14 and WG21 that added the new way, not us.  And it may be that if 
they are to become convinced there is any mistake then they must see real 
world problems arising with real implementations of the existing 
standards, rather than hypothetical problems.  Mistakes were made in C99 
of adding features in general without adequate implementation experience; 
changing them without experience showing what is a genuine problem could 
be seen as another such mistake to avoid.)

I could believe there could be a case for -fextended-identifiers required 
to enable UCNs in identifiers until there is more experience, with 
documentation along the lines of that formerly associated with -pedantic 
"This option is not intended to be useful; ...".

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]