Bug 21521 - -finput-charset -save-temps converts characters twice
Summary: -finput-charset -save-temps converts characters twice
Alias: None
Product: gcc
Classification: Unclassified
Component: preprocessor (show other bugs)
Version: 4.1.0
: P2 normal
Target Milestone: ---
Assignee: Tom Tromey
: 47549 (view as bug list)
Depends on:
Reported: 2005-05-11 22:00 UTC by Joseph S. Myers
Modified: 2015-11-02 14:11 UTC (History)
4 users (show)

See Also:
Known to work:
Known to fail:
Last reconfirmed: 2007-01-08 01:32:53


Note You need to log in before you can comment on or make changes to this bug.
Description Joseph S. Myers 2005-05-11 22:00:40 UTC
Compile the following iso-8859-1 file with -finput-charset=iso-8859-1.  In the
output the string has been converted to UTF-8 as documented.  Compile it with
-finput-charset=iso-8859-1 -save-temps.  In the output the string has been
converted to UTF-8, then that UTF-8 treated as ISO-8859-1 and converted to UTF-8
again, which is incorrect.  If preprocessed output is always in UTF-8 then
preprocessed input must always be treated as in UTF-8 regardless of -finput-charset.

const char s[] = "§";

Compilation time test not requiring execution or scanning assembler:

int a = L'§';

compiled with -finput-charset=iso-8859-1 -save-temps gives a bogus "warning:
character constant too long for its type".
Comment 1 Andrew Pinski 2005-08-27 22:23:16 UTC
Comment 2 Tom Tromey 2006-12-29 07:43:22 UTC
I looked at this a bit tonight.

It is straightforward to remove -finput-charset from the second
invocation of cc1 (needed several places -- gcc.c but also the
c++ and objc lang-specs).

I think this approach fails if -combine is used; this is easily
fixed if it is ok to assume that .i files are encoded in the
"internal" charset. 
Comment 3 Tom Tromey 2006-12-30 23:11:03 UTC
FWIW my initial attempt here, namely inserting %<finput-charset*
into specs in various places, fails if multiple files are given
to gcc when -combine is not used (my patch handles the -combine
case).  This is because the first cc1 invocation removes -finput-charset
from the comand line.
So, some other approach must be found.  Perhaps we can wedge in
another -finput-charset option, or perhaps we can add a special
case of some sort.
Comment 4 patchapp@dberlin.org 2007-01-01 22:55:28 UTC
Subject: Bug number PR preprocessor/21521

A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is http://gcc.gnu.org/ml/gcc-patches/2007-01/msg00027.html
Comment 5 Joseph S. Myers 2009-03-29 22:15:37 UTC
The patch could probably do with pinging by now....
Comment 6 Andrew Pinski 2012-01-06 00:22:50 UTC
*** Bug 47549 has been marked as a duplicate of this bug. ***