This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Input charsets - What's going on?


>>>>> "Zack" == Zack Weinberg <zack@codesourcery.com> writes:

Zack> Paolo Bonzini <bonzini@gnu.org> writes:
>> cppcharset.c's _cpp_default_encoding use nl_langinfo (CHARSET) to find
>> the default input charset.  This usage is guarded by a configure
>> script symbol, HAVE_LANGINFO_CODESET.  Too bad, in the current GCC
>> sources, HAVE_LANGINFO_CODESET is never defined and is not even in the
>> auto-host.h template.  This means that GCC never attempts any input
>> charset conversion except if -finput-charset is given.

Zack> Oops.  I had thought that was there, because Java uses the same
Zack> construct...

This worked at one point.  I did a little research and found a patch
that moved AM_LANGINFO_CODESET to config/gettext.m4.  I'm sure
AM_LANGINFO_CODESET did the appropriate check, but it probably got
removed in some later reorganization (which I didn't look for).

We don't have a test for this functionality AFAIK, which explains how
it would go unnoticed.  It's probably a pain to write one, since
encoding names aren't standardized across platforms (and since iconv
isn't available everywhere, we would need to explain that to the test
suite as well).

Zack> - There may be another common convention over in Java land, which
Zack>   would be friendly to support.

Nope, in Java sources the standard is that everything compiled by a
single invocation of the compiler has the same encoding.  Java doesn't
have the header file problem, since already-compiled code comes in a
portable binary format whose internal encodings are predefined.

Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]