Bug 28977 - UTF-16 endianness differs between gcj and Sun JDK
Summary: UTF-16 endianness differs between gcj and Sun JDK
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: libgcj (show other bugs)
Version: 4.1.2
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-09-07 18:26 UTC by Marcus Better
Modified: 2016-10-03 17:12 UTC (History)
4 users (show)

See Also:
Host: i486-linux-gnu
Target: i486-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcus Better 2006-09-07 18:26:30 UTC
This program gives different results with Sun's JDK (Debian sun-java5-jdk
1.5.0-08-1) and gcj:

======== Test.java ========
import java.io.*;

public class Test
{
    public static void main(String[] args) throws java.io.IOException
    {
        OutputStreamWriter o = new OutputStreamWriter(System.out, "UTF-16");
        o.write("Hello!");
        o.flush();
    }
}
===========================

According to Sun's API docs
  http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/Charset.html
the UTF-16 encoding is supposed to default to big-endian. This is also
what I get when running with Sun's JVM:

00000000: feff 0048 0065 006c 006c 006f 0021       ...H.e.l.l.o.!

But when I run the same program with gij, I get little-endian output:

00000000: fffe 4800 6500 6c00 6c00 6f00 2100       ..H.e.l.l.o.!.

In both cases I executed the same .class file, compiled with the Sun
JDK.

The system is Debian i386 testing/unstable.

$ gij --version
java version "1.4.2"
gij (GNU libgcj) version 4.1.2 20060729 (prerelease) (Debian 4.1.1-10)

This bug was also reported in the Debian bug-tracking system:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=386443
Comment 1 Andrew Pinski 2006-09-07 18:28:59 UTC
Does this really matter as UTF-16 BOM is correct?
Comment 2 Marcus Better 2006-09-07 18:37:37 UTC
(In reply to comment #1)
> Does this really matter as UTF-16 BOM is correct?

Well, I have a library for XML processing that comes with a test suite, and it expects the Sun behaviour. Besides it's documented in Sun's API, so it can potentially matter.
Comment 3 Andrew Pinski 2006-09-07 18:40:09 UTC
(In reply to comment #2)
> Well, I have a library for XML processing that comes with a test suite, and it
> expects the Sun behaviour. Besides it's documented in Sun's API, so it can
> potentially matter.

Really if the testsuite (or the library itself) does not check the BOM, then it is incorrect.  Yes we should follow the Java documention.
Comment 4 Marcus Better 2008-02-03 20:43:08 UTC
The bug is still in gcj 4.3. The Sun API docs are quite clear about how this should behave:

"When decoding, the UTF-16 charset interprets a byte-order mark to indicate the byte order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark." [1]

~$ gij-4.3 --version
java version "1.5.0"
gij (GNU libgcj) version 4.3.0 20080116 (experimental) [trunk revision 131577]

[1] http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html
Comment 5 Andrew Pinski 2016-09-30 22:50:25 UTC
Closing as won't fix as libgcj (and the java front-end) has been removed from the trunk.
Comment 6 Andrew John Hughes 2016-10-03 17:12:32 UTC
Seems to have been specific to libgcj:

out.cacao:    Big-endian UTF-16 Unicode text, with no line terminators
out.gij:      Little-endian UTF-16 Unicode text, with no line terminators
out.icedtea6: Big-endian UTF-16 Unicode text, with no line terminators