This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: revised proposal for GCC and non-Ascii source files
- To: Paul Eggert <eggert at twinsun dot com>
- Subject: Re: revised proposal for GCC and non-Ascii source files
- From: Zack Weinberg <zack at rabi dot columbia dot edu>
- Date: Wed, 30 Dec 1998 17:58:17 -0500
- cc: rms at gnu dot org, bothner at cygnus dot com, amylaar at cygnus dot co dot uk, martin at mira dot isdn dot cs dot tu-berlin dot de, gcc2 at gnu dot org, egcs at cygnus dot com, zack at rabi dot columbia dot edu
I only have a couple comments:
- C9x CD2 unambiguously says \u and \U escapes are to be treated as Unicode.
It also disallows these escapes for certain ranges of Unicode which
encompass all of 7-bit ASCII. That being so, I propose to encode \u and \U
in UTF-8 always. This can be done regardless of the availability of
translation libraries. Assuming cpp and cc1 will take any character with
the high bit set in an identifier, we need only add parsing support to cpp
to make this work.
The only issue is unification of a \u escape for symbol X with the same
symbol natively represented in the input encoding. I'm not sure what the
right way to deal with that is.
- #pragma charset is easily implementable in cpplib (not sure about cccp)
provided we accept constraints on #pragma/_Pragma(). I posted some lengthy
discussion of this last week, but to sum: pragmas affecting the preprocessor
(which this is) cannot be expressed with _Pragma() at all, and neither
#pragma nor _Pragma() can appear in a position that is inconvenient to the
parser -- I think that will translate to "must look like a C
statement-or-declaration".
zw