28977 – UTF-16 endianness differs between gcj and Sun JDK

Bug 28977 - UTF-16 endianness differs between gcj and Sun JDK

Summary: UTF-16 endianness differs between gcj and Sun JDK

Status:	RESOLVED WONTFIX

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	libgcj (show other bugs)
Version:	4.1.2

Importance:	P3 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2006-09-07 18:26 UTC by Marcus Better
Modified:	2016-10-03 17:12 UTC (History)
CC List:	4 users (show)

See Also:
Host:	i486-linux-gnu
Target:	i486-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Marcus Better 2006-09-07 18:26:30 UTC

This program gives different results with Sun's JDK (Debian sun-java5-jdk
1.5.0-08-1) and gcj:

======== Test.java ========
import java.io.*;

public class Test
{
    public static void main(String[] args) throws java.io.IOException
    {
        OutputStreamWriter o = new OutputStreamWriter(System.out, "UTF-16");
        o.write("Hello!");
        o.flush();
    }
}
===========================

According to Sun's API docs
  http://java.sun.com/j2se/1.5.0/docs/api/java/nio/charset/Charset.html
the UTF-16 encoding is supposed to default to big-endian. This is also
what I get when running with Sun's JVM:

00000000: feff 0048 0065 006c 006c 006f 0021       ...H.e.l.l.o.!

But when I run the same program with gij, I get little-endian output:

00000000: fffe 4800 6500 6c00 6c00 6f00 2100       ..H.e.l.l.o.!.

In both cases I executed the same .class file, compiled with the Sun
JDK.

The system is Debian i386 testing/unstable.

$ gij --version
java version "1.4.2"
gij (GNU libgcj) version 4.1.2 20060729 (prerelease) (Debian 4.1.1-10)

This bug was also reported in the Debian bug-tracking system:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=386443

Comment 1 Andrew Pinski 2006-09-07 18:28:59 UTC

Does this really matter as UTF-16 BOM is correct?

Comment 2 Marcus Better 2006-09-07 18:37:37 UTC

(In reply to comment #1)
> Does this really matter as UTF-16 BOM is correct?

Well, I have a library for XML processing that comes with a test suite, and it expects the Sun behaviour. Besides it's documented in Sun's API, so it can potentially matter.

Comment 3 Andrew Pinski 2006-09-07 18:40:09 UTC

(In reply to comment #2)
> Well, I have a library for XML processing that comes with a test suite, and it
> expects the Sun behaviour. Besides it's documented in Sun's API, so it can
> potentially matter.

Really if the testsuite (or the library itself) does not check the BOM, then it is incorrect.  Yes we should follow the Java documention.

Comment 4 Marcus Better 2008-02-03 20:43:08 UTC

The bug is still in gcj 4.3. The Sun API docs are quite clear about how this should behave:

"When decoding, the UTF-16 charset interprets a byte-order mark to indicate the byte order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark." [1]

~$ gij-4.3 --version
java version "1.5.0"
gij (GNU libgcj) version 4.3.0 20080116 (experimental) [trunk revision 131577]

[1] http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html

Comment 5 Andrew Pinski 2016-09-30 22:50:25 UTC

Closing as won't fix as libgcj (and the java front-end) has been removed from the trunk.

Comment 6 Andrew John Hughes 2016-10-03 17:12:32 UTC

Seems to have been specific to libgcj:

out.cacao:    Big-endian UTF-16 Unicode text, with no line terminators
out.gij:      Little-endian UTF-16 Unicode text, with no line terminators
out.icedtea6: Big-endian UTF-16 Unicode text, with no line terminators