This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug java/14636] New: Problem with UTF-8 in IOConverter/iconv on cygwin


I'm not sure whether to report this with cygwin or gcc, but my hunch is that the
problem is more generic than just cygwin.

I have a test class that I'll attach that shows the problem. When I try to
convert an UTF-8 byte-array to a java String, the byte order in the java chars
is wrong. (This is on an Intel platform w. MS Windows XP)

However the field iconv_byte_swap in gnu.gcj.convert.IOConverter is true, as the
test program shows.

An additional complication is that on most platforms, iconv isn't used to UTF-8,
but on cygwin with statically linked binaries, the Input_UTF8 converter class
isn't used because the linker throws it away, so IOConverter falls back on iconv. 

I wonder if the native method gnu::gcj::convert::Input_iconv::read in
natIconv.cc does the byte swapping correctly. It reads characters from a local
variable of type jchar*, swaps the bytes, and then writes it back through a
variable of type char*
Isn't a char 8-bits wide and a jchar 16 bits wide?

Also, this piece of code hasn't changed between release 3.3.1 and the HEAD.


There is a workaround: include a reference to the class that implements the UTF8
converter in Java, to force the linker to include it in the executable.

- Erwin



Full gcj -v information:


Configured with: /GCC/gcc-3.3.1-3/configure --with-gcc --with-gnu-ld --with-gnu-
as --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --libdir=/usr/lib --libexe
cdir=/usr/sbin --mandir=/usr/share/man --infodir=/usr/share/info --enable-langua
ges=c,ada,c++,f77,pascal,java,objc --enable-libgcj --enable-threads=posix --with
-system-zlib --enable-nls --without-included-gettext --enable-interpreter --enab
le-sjlj-exceptions --disable-version-specific-runtime-libs --enable-shared --dis
able-win32-registry --enable-java-gc=boehm --disable-hash-synchronization --verb
ose --target=i686-pc-cygwin --host=i686-pc-cygwin --build=i686-pc-cygwin
Thread model: posix
gcc version 3.3.1 (cygming special)

-- 
           Summary: Problem with UTF-8 in IOConverter/iconv on cygwin
           Product: gcc
           Version: 3.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: java
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: erwin at klomp dot org
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: 3.3.1
  GCC host triplet: i686-pc-cygwin
GCC target triplet: i686-pc-cygwin


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14636


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]