GCJ manual changed

Per Bothner per@bothner.com
Wed Jan 30 09:46:00 GMT 2002


Joseph S. Myers wrote:
> Is there a proper name for this, e.g. UTF-JAVA?  (DUTR#26 defines CESU-8
> which encodes surrogate pairs like you describe but doesn't seem to have a
> special encoding of '\u0'.)
> 
> Where these docs refer to UTF-8, do they mean UTF-8, or this variant?

The VM specification does talk about "UTF-8 strings". Also:

   There are two differences between this format and the "standard" UTF-8
   format. First, the null byte (byte)0  is encoded using the 2-byte
   format rather than the 1-byte format, so that Java virtual machine
   UTF-8 strings never have embedded nulls. Second, only the 1-byte,
   2-byte, and 3-byte formats are used. The Java virtual machine does not
   recognize the longer UTF-8 formats.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/



More information about the Java-patches mailing list