GCC Bugzilla – Bug 15575
HAVE_LANGINFO_CODESET never defined
Last modified: 2004-11-06 15:49:32 UTC
See this note and its enclosing thread: http://gcc.gnu.org/ml/gcc/2004-05/msg01090.html Apparently, HAVE_LANGINFO_CODESET is never defined by configure, meaning that the user's locale's encoding will never be detected by gcj. I believe this is a regression against some earlier version of gcj. I haven't verified the facts of the report personally.
Confirmed.
Patch here: <http://gcc.gnu.org/ml/gcc-patches/2004-05/msg01414.html>.
Please wait before applying that libcpp is moved to the toplevel. Otherwise the patch is going to break libgfortran which has 8-bit characters in it (and up to a few days, C and Java had too). For more information, see http://gcc.gnu.org/ml/gcc/2004-05/msg01007.html and the reply at http://gcc.gnu.org/ml/gcc/2004-05/msg01026.html
This can be applied now.
Do we really want to fix this? The "buggy" behaviour actually seems better here because it more closely matches what other Java compilers do and seems to have resulted in less complaints from users since it "broke". I propose we close this as WONTFIX and update the documentation to specify that Utf8 is the default encoding for input files unless specified otherwise with the --encoding flag. Comments?
Subject: Re: HAVE_LANGINFO_CODESET never defined On Wed, 20 Oct 2004, mckinlay at redhat dot com wrote: > Do we really want to fix this? > > The "buggy" behaviour actually seems better here because it more closely matches > what other Java compilers do and seems to have resulted in less complaints from > users since it "broke". > > I propose we close this as WONTFIX and update the documentation to specify that > Utf8 is the default encoding for input files unless specified otherwise with the > --encoding flag. Comments? I don't know what is best for Java, but for the C compiler POSIX specifies use of locale to determine the encoding of source files. In addition, if HAVE_LANGINFO_CODESET were set properly then people using UTF-8 locales would get proper quotes in error messages. If particular languages do not want this or don't work with it at present, they need not use the locale for source files, but the configure test should go in for the use of diagnostics if nothing else. I understand Zack has proposals for changes to cpplib which would mean that for well-behaved locale character sets (supersets of ASCII, roughly) stray invalid characters in comments can be ignored rather than causing an error through not being in the locale character set (and speed up cpplib by not needing to pass most of most files through iconv).
My understanding is that other java compilers do use the locale's default encoding. However, unlike the glibc iconv() converter, typically javac treats ASCII as equivalent to Latin 1.
Forget what I said, Tom is right. I just tested this again, and javac from JDK 1.5 does indeed use the Locale setting to determine the default encoding. Further more, javac does appear to distinguish between ASCII and Latin1 now. I will re-test the patch and ping it to gcc-patches.
Subject: Bug 15575 CVSROOT: /cvs/gcc Module name: gcc Changes by: bryce@gcc.gnu.org 2004-10-20 21:36:48 Modified files: gcc : ChangeLog configure.ac aclocal.m4 configure config.in Log message: 2004-10-20 Bryce McKinlay <mckinlay@redhat.com> PR java/15575 * configure.ac: Declare AM_LANGINFO_CODESET. * aclocal.m4: Define AM_LANGINFO_CODESET. * configure, config.in: Rebuilt. Patches: http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/ChangeLog.diff?cvsroot=gcc&r1=2.5960&r2=2.5961 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/configure.ac.diff?cvsroot=gcc&r1=2.77&r2=2.78 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/aclocal.m4.diff?cvsroot=gcc&r1=1.98&r2=1.99 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/configure.diff?cvsroot=gcc&r1=1.868&r2=1.869 http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/config.in.diff?cvsroot=gcc&r1=1.199&r2=1.200
Fix checked in.