[Bug other/28315] gcc doesn't use locale for default input charset

lacos at caesar dot elte.hu gcc-bugzilla@gcc.gnu.org
Fri Mar 29 13:17:00 GMT 2013


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28315

Laszlo Ersek <lacos at caesar dot elte.hu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bonzini at gnu dot org,
                   |                            |lacos at caesar dot elte.hu

--- Comment #1 from Laszlo Ersek <lacos at caesar dot elte.hu> 2013-03-29 13:17:21 UTC ---
gcc has defaulted to UTF-8 rather than the locale's codeset in
_cpp_default_encoding() [libcpp/charset.c] since the following 2004 hunk:

    http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d856c8a6#patch25

(
  The default encoding is selected for both "input_charset" (overrideable
  with -finput-charset) and "narrow_charset" (overrideable with
  -fexec-charset):

    cpp_create_reader() [libcpp/init.c]
      ~ narrow_charset = _cpp_default_encoding()
      ~ input_charset = _cpp_default_encoding()

  The "overrides" are implemented in c_common_handle_option()
  [gcc/c-family/c-opts.c].
)

Considering the encodings of source files "in the wild" that gcc has been
used to compile in the last 8+ years (ie. while the "&& 0" has been in
place):

- UTF-8 (of which 7-bit ASCII is a subset) worked.

- Any non-UTF-8 encoding that utilized the MSB (eg. ISO-8859-2) required the
  -finput-charset option.

  People who would have originally wanted gcc to take that codeset from the
  locale were probably *developing* the source code in question, hence they
  could easily add the -finput-charset to their makefiles.

Much of the world must have migrated to UTF-8-encoded locales by now.
Reverting the "&& 0" would:

- not affect people with such a distro-default locale who build UTF-8 /
  ASCII sources: their locale codeset matches the current hardwired default,

- not affect people building sources with non-UTF-8 8-bit codesets (eg.
  ISO-8859-2), since those projects already have to use the -finput-charset
  options in their makefiles,

- affect people who have stuck to their 7-bit ASCII, or non-UTF-8 8-bit
  codesets in their locales, and compile real UTF-8 sources.

People in the last group (which includes me :)) would be forced to (a)
modify their locale when building such sources as end-users, or (b) to find
out about -finput-charset=UTF-8 and pass it via (b1) Makefile hacking or
(b2) ./configure settings (env vars, or command line options).

I think that's unreasonable; building random projects from the tubes would
break for this small but existent group of users.

Therefore I suggest to keep the logic as-is, and update the docs instead
("gcc/doc/cppopts.texi"): "-finput-charset" should not refer to the locale.



More information about the Gcc-bugs mailing list