GCJ manual changed
Per Bothner
per@bothner.com
Wed Jan 30 09:46:00 GMT 2002
Joseph S. Myers wrote:
> Is there a proper name for this, e.g. UTF-JAVA? (DUTR#26 defines CESU-8
> which encodes surrogate pairs like you describe but doesn't seem to have a
> special encoding of '\u0'.)
>
> Where these docs refer to UTF-8, do they mean UTF-8, or this variant?
The VM specification does talk about "UTF-8 strings". Also:
There are two differences between this format and the "standard" UTF-8
format. First, the null byte (byte)0 is encoded using the 2-byte
format rather than the 1-byte format, so that Java virtual machine
UTF-8 strings never have embedded nulls. Second, only the 1-byte,
2-byte, and 3-byte formats are used. The Java virtual machine does not
recognize the longer UTF-8 formats.
--
--Per Bothner
per@bothner.com http://www.bothner.com/per/
More information about the Java-patches
mailing list