This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: UTF-8 quotation marks in diagnostics


On Wed, 21 Oct 2015, D. Hugh Redelmeier wrote:

> 	The LC_CTYPE environment variable specifies character
> 	classification.  GCC uses it to determine the character
> 	boundaries in a string; this is needed for some multibyte
> 	encodings that contain quote and escape characters that are
> 	otherwise interpreted as a string end or escape.

That's inaccurate.  The default source encoding is always UTF-8.  See the 
comment in libcpp/charset.c.

  /* We disable this because the default codeset is 7-bit ASCII on
     most platforms, and this causes conversion failures on every
     file in GCC that happens to have one of the upper 128 characters
     in it -- most likely, as part of the name of a contributor.
     We should definitely recognize in-band markers of file encoding,
     like:
     - the appropriate Unicode byte-order mark (FE FF) to recognize
       UTF16 and UCS4 (in both big-endian and little-endian flavors)
       and UTF8
     - a "#i", "#d", "/ *", "//", " #p" or "#p" (for #pragma) to
       distinguish ASCII and EBCDIC.
     - now we can parse something like "#pragma GCC encoding <xyz>
       on the first line, or even Emacs/VIM's mode line tags (there's
       a problem here in that VIM uses the last line, and Emacs has
       its more elaborate "local variables" convention).
     - investigate whether Java has another common convention, which
       would be friendly to support.
     (Zack Weinberg and Paolo Bonzini, May 20th 2004)  */

I haven't checked whether the documentation (and the matching 
documentation for -finput-charset) was once accurate in this regard (i.e. 
if the documentation in question dates from a time when LC_CTYPE did 
determine the source character set).

-- 
Joseph S. Myers
joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]