This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: revised proposal for GCC and non-Ascii source files
- To: zack at rabi dot columbia dot edu
- Subject: Re: revised proposal for GCC and non-Ascii source files
- From: Paul Eggert <eggert at twinsun dot com>
- Date: Fri, 1 Jan 1999 17:15:32 -0800 (PST)
- CC: martin at mira dot isdn dot cs dot tu-berlin dot de, rms at gnu dot org, bothner at cygnus dot com, amylaar at cygnus dot co dot uk, gcc2 at gnu dot org, egcs at cygnus dot com
- References: <199901011904.OAA07334@rabi.phys.columbia.edu>
Date: Fri, 01 Jan 1999 14:04:14 -0500
From: Zack Weinberg <zack@rabi.columbia.edu>
- Can we forbid UCNs and native extended characters as the first character
of an identifier?
We could forbid native extended characters (since they're an
implementation extension) but we can't forbid the UCNs allowed by
Annex I, since the standard requires that we support them. I don't
think it's wise to forbid the native extended characters, since this
will be a real hardship for people who want to use non-Ascii
identifiers.
- C9x says UCNs (and presumably extended characters) may not designate any
character in the required source character set. This means that there is no
problem with recognizing comments or strings even when we don't know the
source charset;
This is true only if we forbid native extended characters that can be
confused with end of string or comment. This restriction would be
unreasonable for e.g. Shift-JIS, which has multibyte characters that
contain `\' and `*' bytes. This is the motivation for requiring that
#ctype be at the start of the file (possibly preceded by ``safe''
comments).
- What do you do with an identifier that has 'V' in it already?
I leave it alone, unless the identifier also has a non-Ascii character in it.
- I don't like allowing arbitrary ASCII non-required-charset symbols, such
as '@', in identifiers (this seems to be implicitly permitted in your
proposal).
I was using `@' to denote MICRO SIGN (Unicode `00B5'); I didn't want
to put that character in email since it might have gotten munged. I
agree that only non-Ascii characters should be allowed in identifiers
(other than the Ascii characters already allowed); I'll add the
following clarifying note to (3):
Each non-Ascii character is allowed in an identifier, string, or comment.