This program gives different results with Sun's JDK (Debian sun-java5-jdk 1.5.0-08-1) and gcj: ======== Test.java ======== import java.io.*; public class Test { public static void main(String[] args) throws java.io.IOException { OutputStreamWriter o = new OutputStreamWriter(System.out, "UTF-16"); o.write("Hello!"); o.flush(); } } =========================== According to Sun's API docs http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/Charset.html the UTF-16 encoding is supposed to default to big-endian. This is also what I get when running with Sun's JVM: 00000000: feff 0048 0065 006c 006c 006f 0021 ...H.e.l.l.o.! But when I run the same program with gij, I get little-endian output: 00000000: fffe 4800 6500 6c00 6c00 6f00 2100 ..H.e.l.l.o.!. In both cases I executed the same .class file, compiled with the Sun JDK. The system is Debian i386 testing/unstable. $ gij --version java version "1.4.2" gij (GNU libgcj) version 4.1.2 20060729 (prerelease) (Debian 4.1.1-10) This bug was also reported in the Debian bug-tracking system: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=386443
Does this really matter as UTF-16 BOM is correct?
(In reply to comment #1) > Does this really matter as UTF-16 BOM is correct? Well, I have a library for XML processing that comes with a test suite, and it expects the Sun behaviour. Besides it's documented in Sun's API, so it can potentially matter.
(In reply to comment #2) > Well, I have a library for XML processing that comes with a test suite, and it > expects the Sun behaviour. Besides it's documented in Sun's API, so it can > potentially matter. Really if the testsuite (or the library itself) does not check the BOM, then it is incorrect. Yes we should follow the Java documention.
The bug is still in gcj 4.3. The Sun API docs are quite clear about how this should behave: "When decoding, the UTF-16 charset interprets a byte-order mark to indicate the byte order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark." [1] ~$ gij-4.3 --version java version "1.5.0" gij (GNU libgcj) version 4.3.0 20080116 (experimental) [trunk revision 131577] [1] http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html
Closing as won't fix as libgcj (and the java front-end) has been removed from the trunk.
Seems to have been specific to libgcj: out.cacao: Big-endian UTF-16 Unicode text, with no line terminators out.gij: Little-endian UTF-16 Unicode text, with no line terminators out.icedtea6: Big-endian UTF-16 Unicode text, with no line terminators