This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug java/14636] New: Problem with UTF-8 in IOConverter/iconv on cygwin
- From: "erwin at klomp dot org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: 18 Mar 2004 16:42:47 -0000
- Subject: [Bug java/14636] New: Problem with UTF-8 in IOConverter/iconv on cygwin
- Reply-to: gcc-bugzilla at gcc dot gnu dot org
I'm not sure whether to report this with cygwin or gcc, but my hunch is that the
problem is more generic than just cygwin.
I have a test class that I'll attach that shows the problem. When I try to
convert an UTF-8 byte-array to a java String, the byte order in the java chars
is wrong. (This is on an Intel platform w. MS Windows XP)
However the field iconv_byte_swap in gnu.gcj.convert.IOConverter is true, as the
test program shows.
An additional complication is that on most platforms, iconv isn't used to UTF-8,
but on cygwin with statically linked binaries, the Input_UTF8 converter class
isn't used because the linker throws it away, so IOConverter falls back on iconv.
I wonder if the native method gnu::gcj::convert::Input_iconv::read in
natIconv.cc does the byte swapping correctly. It reads characters from a local
variable of type jchar*, swaps the bytes, and then writes it back through a
variable of type char*
Isn't a char 8-bits wide and a jchar 16 bits wide?
Also, this piece of code hasn't changed between release 3.3.1 and the HEAD.
There is a workaround: include a reference to the class that implements the UTF8
converter in Java, to force the linker to include it in the executable.
- Erwin
Full gcj -v information:
Configured with: /GCC/gcc-3.3.1-3/configure --with-gcc --with-gnu-ld --with-gnu-
as --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --libexe
cdir=/usr/sbin --mandir=/usr/share/man --infodir=/usr/share/info --enable-langua
ges=c,ada,c++,f77,pascal,java,objc --enable-libgcj --enable-threads=posix --with
-system-zlib --enable-nls --without-included-gettext --enable-interpreter --enab
le-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --dis
able-win32-registry --enable-java-gc=boehm --disable-hash-synchronization --verb
ose --target=i686-pc-cygwin --host=i686-pc-cygwin --build=i686-pc-cygwin
Thread model: posix
gcc version 3.3.1 (cygming special)
--
Summary: Problem with UTF-8 in IOConverter/iconv on cygwin
Product: gcc
Version: 3.3.1
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: java
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: erwin at klomp dot org
CC: gcc-bugs at gcc dot gnu dot org
GCC build triplet: 3.3.1
GCC host triplet: i686-pc-cygwin
GCC target triplet: i686-pc-cygwin
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14636