This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: UTF-8 quotation marks in diagnostics

On 2015-10-22 20:11:15 +0000, Joseph Myers wrote:
> LC_CTYPE should affect the interpretation of multibyte character sequences 
> as characters, including on output.  That's the standard semantics.

That's only for the recommended default behavior. There are many
contexts where different charset information is provided.

Other than that, LC_CTYPE is assumed to correspond to the charset
of the terminal.

> That's what all C library functions involving interpretation of multibyte 
> character sequences do.  Straightforward use of POSIX library interfaces 
> does not support producing output in a character set other than that 
> specified with LC_CTYPE; e.g. printf expects a format string (possibly 
> resulting from a message catalog) in the LC_CTYPE character set, and does 
> not convert the bytes to another character set.

Only when a setlocale() with appropriate arguments is done.
A C program is free to use other locales than declared by
the LC_* environment variables when this makes sense.

> Again, LC_CTYPE does *not* affect source file interpretation.

> You could write your "c99" program wrapper to add a -finput-charset= 
> option based on the locale's character set if you so wish (it also needs 
> to do things such as option reordering and handling -O with separate 
> argument - the "gcc" driver deliberately processes -D and -U options in 
> the order they appear on the command line, not following the POSIX rule 
> that -U options take precedence over -D - so you should not expect the 
> "gcc" driver to be usable as "c99" without such adaptation for deliberate 
> differences).
> I think we should clearly update the documentation to reflect reality 
> regarding source file encoding, and leave it strictly for wrappers such as 
> "c99" to specify -finput-charset= options rather than leaving open the 
> possibility that GCC's own default might change in future.

The documentation should also say whether LC_CTYPE affects the
command-line arguments (e.g. macro values via -D) and in what way
it affects the output (e.g. messages and output of "gcc -E").

Vincent Lefèvre <> - Web: <>
100% accessible validated (X)HTML - Blog: <>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]