This is the mail archive of the
mailing list for the GCC project.
Re: UTF-8 quotation marks in diagnostics
- From: Vincent Lefevre <vincent+gcc at vinc17 dot org>
- To: gcc at gcc dot gnu dot org
- Date: Sun, 1 Nov 2015 03:23:27 +0100
- Subject: Re: UTF-8 quotation marks in diagnostics
- Authentication-results: sourceware.org; auth=none
- References: <alpine dot LRH dot 2 dot 02 dot 1510211705080 dot 3681 at redclaw dot mimosa dot com> <56283B7D dot 9020903 at gmail dot com> <alpine dot DEB dot 2 dot 10 dot 1510221629450 dot 8870 at digraph dot polyomino dot org dot uk> <5629334C dot 7020009 at gmail dot com> <alpine dot DEB dot 2 dot 10 dot 1510221935230 dot 23141 at digraph dot polyomino dot org dot uk>
On 2015-10-22 20:11:15 +0000, Joseph Myers wrote:
> LC_CTYPE should affect the interpretation of multibyte character sequences
> as characters, including on output. That's the standard semantics.
That's only for the recommended default behavior. There are many
contexts where different charset information is provided.
Other than that, LC_CTYPE is assumed to correspond to the charset
of the terminal.
> That's what all C library functions involving interpretation of multibyte
> character sequences do. Straightforward use of POSIX library interfaces
> does not support producing output in a character set other than that
> specified with LC_CTYPE; e.g. printf expects a format string (possibly
> resulting from a message catalog) in the LC_CTYPE character set, and does
> not convert the bytes to another character set.
Only when a setlocale() with appropriate arguments is done.
A C program is free to use other locales than declared by
the LC_* environment variables when this makes sense.
> Again, LC_CTYPE does *not* affect source file interpretation.
> You could write your "c99" program wrapper to add a -finput-charset=
> option based on the locale's character set if you so wish (it also needs
> to do things such as option reordering and handling -O with separate
> argument - the "gcc" driver deliberately processes -D and -U options in
> the order they appear on the command line, not following the POSIX rule
> that -U options take precedence over -D - so you should not expect the
> "gcc" driver to be usable as "c99" without such adaptation for deliberate
> I think we should clearly update the documentation to reflect reality
> regarding source file encoding, and leave it strictly for wrappers such as
> "c99" to specify -finput-charset= options rather than leaving open the
> possibility that GCC's own default might change in future.
The documentation should also say whether LC_CTYPE affects the
command-line arguments (e.g. macro values via -D) and in what way
it affects the output (e.g. messages and output of "gcc -E").
Vincent Lefèvre <firstname.lastname@example.org> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)