[Bug preprocessor/92987] New: -finput-charset is only usable with encodings that are supersets of ASCII

Wed Dec 18 15:45:00 GMT 2019

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92987

            Bug ID: 92987
           Summary: -finput-charset is only usable with encodings that are
                    supersets of ASCII
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: preprocessor
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lhyatt at gmail dot com
  Target Milestone: ---

-finput-charset supports converting all encodings supported by iconv, and also
UTF-32 and UTF-16 are supported directly with routines in libcpp/charset.c.
However, -finput-charset does not seem to actually be usable unless the chosen
encoding is a superset of ASCII, because it applies to all header files
included from the source as well. Even an empty source file implicitly includes
/usr/include/stdc-predef.h, and so there is nothing that can be compiled with
say -finput-charset=UTF-32LE:

$ echo -n > t.c
$ gcc -S -finput-charset=UTF-32LE t.c
cc1: error: failure to convert UTF-32LE to UTF-8

The error comes while processing stdc-predef.h.

I was about to work on adding support for -finput-charset into diagnostics
infrastructure (it currently ignores it), however it seems like this issue
should probably be dealt with first, since it may entail adding the notion that
different source files have a different input encoding. I am not sure what
would be the desired way to address it. Are there use cases where it is
desirable that -finput-charset applies to the #includes too (I guess systems
could exist where the system headers are not ASCII)? Would it make sense to add
a new option that changed the charset only for source files, and not the
#includes? Or maybe it should be kept for "..." includes and not for <...> or
something like this?

-Lewis