This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: UCNs-in-IDs patch


On Thu, 17 Mar 2005, Per Bothner wrote:

> To go further: it may be acceptable that foo.E always
> be in UTF-8, even when that doesn't match the input Locale.
> Or always use ASCII with \U-escapes.  Or better: if the
> current locale uses UTF-8. emit UTF-8; otherwise emit
> ASCII with \U-escapes.  Code that reads pre-processed input
> should assume the input is UTF-8 which might contain \U-escapes,
> rather than the current locale.

I'd say that for C++ the preprocessed output should contain UCNs (because 
of the C++ phase 1 mapping), for C as a quality-of-implementation issue - 
and as a correctness issue insofar as we say that the preprocessed output 
is the token sequence resulting from the preprocessing phases - it should 
use original token spellings.  I don't like the phase 1 
implementation-defined mapping being any more complicated than it needs to 
be (i.e., converting the input character set, specified in the documented 
way, to Unicode using iconv).  I think we can just barely justify how we 
ignore whitespace between backslash and newline (unfortunately not 
documented in the documentation of implementation-defined behavior) on the 
grounds of user confusion, and ignoring byte sequences not in the input 
character set within comments (with a warning, and only for 
ASCII-compatible character sets) (which we don't yet do, but it might 
allow us to start using the locale character set as the default input 
character set) on similar grounds: but a UCN conversion not needed by the 
standard (i.e. any other than the standard C++ one) doesn't seem justified 
that way.

-- 
Joseph S. Myers               http://www.srcf.ucam.org/~jsm28/gcc/
    jsm@polyomino.org.uk (personal mail)
    joseph@codesourcery.com (CodeSourcery mail)
    jsm28@gcc.gnu.org (Bugzilla assignments and CCs)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]