This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: The integrated preprocessor
Zack Weinberg <zack@wolery.cumb.org> writes:
> I see that the C++ standard spells out which Unicode characters are
> acceptable in identifiers; the C standard (as I understand it) left
> that to the implementation, and that was the major reason why I didn't
> like the idea.
I don't see how the C standard leaves anything special to the
implementation.
s. 6.4.2.1 paragraph 3 says
Each universal character name in an identifier shall designate a
character whose encoding in ISO/IEC 10646 falls into one of the
ranges specified in annex D. ... An implementation may allow
multibyte characters ... which characters and their correspondence
to universal character names is implementation-defined.
Note the 'shall'. Annex D is also marked as 'normative'.
All this means is that the compiler can interpret its input as being
in whatever character set it likes, which is true for both the C and
C++ standards. I believe the intent was that the C compiler, if
written in C, could/should use the usual LC_CTYPE locale information
including the wchar-based I/O functions.
It is true that `portable' code should be written in the base
character set, that is it should use \u escapes, but only because
there's no guarantee that any particular implementation will have
support for any extra characters. In fact, truly portable code should
use all the trigraphs, too, so it can run on EBCDIC machines and
suchlike, but IMHO that's taking portability too far.
--
- Geoffrey Keating <geoffk@cygnus.com>