This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: The integrated preprocessor



Zack Weinberg <zack@wolery.cumb.org> writes:

> I see that the C++ standard spells out which Unicode characters are
> acceptable in identifiers; the C standard (as I understand it) left
> that to the implementation, and that was the major reason why I didn't
> like the idea.

I don't see how the C standard leaves anything special to the
implementation.

s. 6.4.2.1 paragraph 3 says 

  Each universal character name in an identifier shall designate a
  character whose encoding in ISO/IEC 10646 falls into one of the
  ranges specified in annex D.  ... An implementation may allow
  multibyte characters ... which characters and their correspondence
  to universal character names is implementation-defined.

Note the 'shall'.  Annex D is also marked as 'normative'.

All this means is that the compiler can interpret its input as being
in whatever character set it likes, which is true for both the C and
C++ standards.  I believe the intent was that the C compiler, if
written in C, could/should use the usual LC_CTYPE locale information
including the wchar-based I/O functions.

It is true that `portable' code should be written in the base
character set, that is it should use \u escapes, but only because
there's no guarantee that any particular implementation will have
support for any extra characters.  In fact, truly portable code should
use all the trigraphs, too, so it can run on EBCDIC machines and
suchlike, but IMHO that's taking portability too far.

-- 
- Geoffrey Keating <geoffk@cygnus.com>

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]