Bug 14636 - Problem with UTF-8 in IOConverter/iconv on cygwin
Summary: Problem with UTF-8 in IOConverter/iconv on cygwin
Status: RESOLVED DUPLICATE of bug 13708
Alias: None
Product: gcc
Classification: Unclassified
Component: libgcj (show other bugs)
Version: 3.3.1
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-03-18 16:42 UTC by Erwin Bolwidt
Modified: 2005-07-23 22:49 UTC (History)
2 users (show)

See Also:
Host: i686-pc-cygwin
Target: i686-pc-cygwin
Build: 3.3.1
Known to work:
Known to fail:
Last reconfirmed:


Attachments
This piece of code will trigger the bug with gcj-3.3.1 and cygwin on windows. (430 bytes, text/plain)
2004-03-18 16:44 UTC, Erwin Bolwidt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Erwin Bolwidt 2004-03-18 16:42:30 UTC
I'm not sure whether to report this with cygwin or gcc, but my hunch is that the
problem is more generic than just cygwin.

I have a test class that I'll attach that shows the problem. When I try to
convert an UTF-8 byte-array to a java String, the byte order in the java chars
is wrong. (This is on an Intel platform w. MS Windows XP)

However the field iconv_byte_swap in gnu.gcj.convert.IOConverter is true, as the
test program shows.

An additional complication is that on most platforms, iconv isn't used to UTF-8,
but on cygwin with statically linked binaries, the Input_UTF8 converter class
isn't used because the linker throws it away, so IOConverter falls back on iconv. 

I wonder if the native method gnu::gcj::convert::Input_iconv::read in
natIconv.cc does the byte swapping correctly. It reads characters from a local
variable of type jchar*, swaps the bytes, and then writes it back through a
variable of type char*
Isn't a char 8-bits wide and a jchar 16 bits wide?

Also, this piece of code hasn't changed between release 3.3.1 and the HEAD.


There is a workaround: include a reference to the class that implements the UTF8
converter in Java, to force the linker to include it in the executable.

- Erwin



Full gcj -v information:


Configured with: /GCC/gcc-3.3.1-3/configure --with-gcc --with-gnu-ld --with-gnu-
as --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --libexe
cdir=/usr/sbin --mandir=/usr/share/man --infodir=/usr/share/info --enable-langua
ges=c,ada,c++,f77,pascal,java,objc --enable-libgcj --enable-threads=posix --with
-system-zlib --enable-nls --without-included-gettext --enable-interpreter --enab
le-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --dis
able-win32-registry --enable-java-gc=boehm --disable-hash-synchronization --verb
ose --target=i686-pc-cygwin --host=i686-pc-cygwin --build=i686-pc-cygwin
Thread model: posix
gcc version 3.3.1 (cygming special)
Comment 1 Erwin Bolwidt 2004-03-18 16:44:10 UTC
Created attachment 5941 [details]
This piece of code will trigger the bug with gcj-3.3.1 and cygwin on windows.
Comment 2 Andrew Pinski 2004-03-18 16:56:12 UTC
This is a dup of bug 12908.

*** This bug has been marked as a duplicate of 12908 ***
Comment 3 Erwin Bolwidt 2004-03-18 17:09:47 UTC
Please, read more carefully.

The bug is NOT that Input_UTF8 is missing. Yes, Input_UTF8 is a good workaround
for this bug in the case of UTF8.

But _this_ bug report is about a bug in the IOConverter class and the iconv
interface, and bug 12908 is _not_ about that.

Also, there probably are more converters that are supported by iconv than by
Java implementations, and they probably all exhibit the same problem.

 

Comment 4 Andrew Pinski 2004-03-18 17:23:42 UTC
Either it is a dup of bug 9715 or a bug 12908.  The problem (9715) might be that iconv on cygwin is 
not that good and does not support them.  Also PR 13708 is about making sure that the UTF8 converter 
stays in, no matter what so this is a dup of bug 13708 then.

*** This bug has been marked as a duplicate of 13708 ***