Compile the following iso-8859-1 file with -finput-charset=iso-8859-1. In the
output the string has been converted to UTF-8 as documented. Compile it with
-finput-charset=iso-8859-1 -save-temps. In the output the string has been
converted to UTF-8, then that UTF-8 treated as ISO-8859-1 and converted to UTF-8
again, which is incorrect. If preprocessed output is always in UTF-8 then
preprocessed input must always be treated as in UTF-8 regardless of -finput-charset.
const char s = "§";
Compilation time test not requiring execution or scanning assembler:
int a = L'§';
compiled with -finput-charset=iso-8859-1 -save-temps gives a bogus "warning:
character constant too long for its type".
I looked at this a bit tonight.
It is straightforward to remove -finput-charset from the second
invocation of cc1 (needed several places -- gcc.c but also the
c++ and objc lang-specs).
I think this approach fails if -combine is used; this is easily
fixed if it is ok to assume that .i files are encoded in the
FWIW my initial attempt here, namely inserting %<finput-charset*
into specs in various places, fails if multiple files are given
to gcc when -combine is not used (my patch handles the -combine
case). This is because the first cc1 invocation removes -finput-charset
from the comand line.
So, some other approach must be found. Perhaps we can wedge in
another -finput-charset option, or perhaps we can add a special
case of some sort.
Subject: Bug number PR preprocessor/21521
A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2007-01/msg00027.html
The patch could probably do with pinging by now....
*** Bug 47549 has been marked as a duplicate of this bug. ***