This is the mail archive of the
java-prs@gcc.gnu.org
mailing list for the Java project.
libgcj/9802: Bug in surrogate handling in Unicode to UTF-8 conversion
- From: jjc at jclark dot com
- To: gcc-gnats at gcc dot gnu dot org
- Date: 22 Feb 2003 09:51:10 -0000
- Subject: libgcj/9802: Bug in surrogate handling in Unicode to UTF-8 conversion
- Reply-to: jjc at jclark dot com
>Number: 9802
>Category: libgcj
>Synopsis: Bug in surrogate handling in Unicode to UTF-8 conversion
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: unassigned
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Feb 22 09:56:01 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: jjc at jclark dot com
>Release: gcc version 3.3 20030217 (prerelease)
>Organization:
>Environment:
Red Hat Linux 8.0
>Description:
The following program
class Bug {
static public char surrogate1(int c) {
return (char)(((c - 0x10000) >> 10) | 0xD800);
}
static public char surrogate2(int c) {
return (char)(((c - 0x10000) & 0x3FF) | 0xDC00);
}
static public void main(String[] args) throws java.io.UnsupportedEncodingException {
int ch = 0x10300;
char[] v = new char[2];
v[0] = surrogate1(ch);
v[1] = surrogate2(ch);
String str = new String(v);
str.getBytes("UTF-8");
}
}
when compiled and executed throws an exception
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at gnu.gcj.convert.Output_UTF8.write(char[], int, int) (/home/jjc/gcc/lib/libgcj.so.4.0.0)
at gnu.gcj.convert.UnicodeToBytes.write(java.lang.String, int, int, char[]) (/home/jjc/gcc/lib/libgcj.so.4.0.0)
at java.lang.String.getBytes(java.lang.String) (/home/jjc/gcc/lib/libgcj.so.4.0.0)
at Bug.main(java.lang.String[]) (Unknown Source)
>How-To-Repeat:
>Fix:
I haven't tested this, but I suspect the following should fix it:
*** gcc/libjava/gnu/gcj/convert/Output_UTF8.java~ 2000-08-09 00:35:32.000000000 +0700
--- gcc/libjava/gnu/gcj/convert/Output_UTF8.java 2003-02-22 16:38:52.000000000 +0700
***************
*** 104,109 ****
--- 104,110 ----
{
value = (hi_part - 0xD800) * 0x400 + (ch - 0xDC00) + 0x10000;
buf[count++] = (byte) (0xF0 | (value >> 18));
+ avail--
bytes_todo = 3;
hi_part = 0;
}
>Release-Note:
>Audit-Trail:
>Unformatted: