This is the mail archive of the gcc-prs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: libgcj/9802: Bug in surrogate handling in Unicode to UTF-8conversion


The following reply was made to PR libgcj/9802; it has been noted by GNATS.

From: Mark Wielaard <mark at klomp dot org>
To: gcc-gnats at gcc dot gnu dot org, jjc at jclark dot com, java-prs at gcc dot gnu dot org,  gcc-bugs at gcc dot gnu dot org, nobody at gcc dot gnu dot org, gcc-prs at gcc dot gnu dot org
Cc:  
Subject: Re: libgcj/9802: Bug in surrogate handling in Unicode to UTF-8
	conversion
Date: 22 Feb 2003 14:38:56 +0100

 Thanks for the bug report.
 Your suggested fix seems obviously correct and I verified that making
 sure that avail is always decremented makes String.getBytes("UTF-8")
 work (read not throw an ArrayIndexOutOfBoundException).
 
 But while creating a test case I noticed that for your example we return
 two bytes: {0xf0, 0x90} but other implementations return four bytes
 {0xf0, 0x90, 0x8c, 0x80}. I don't know enough of Unicode and UTF-8
 encoding to know what is correct or why.
 
 If someone has a quick reference to the relevant definitions and/or a
 testsuite for these kind of things that would be higly appreciated.
 
 http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=gcc&pr=9802
 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]