This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: GCJ manual changed
- From: "Joseph S. Myers" <jsm28 at cam dot ac dot uk>
- To: Per Bothner <per at bothner dot com>
- Cc: Nic Ferrier <nferrier at tapsellferrier dot co dot uk>, <java-patches at gcc dot gnu dot org>, <gcc-patches at gcc dot gnu dot org>
- Date: Wed, 30 Jan 2002 17:54:13 +0000 (GMT)
- Subject: Re: GCJ manual changed
On Wed, 30 Jan 2002, Per Bothner wrote:
> The VM specification does talk about "UTF-8 strings". Also:
>
> There are two differences between this format and the "standard" UTF-8
> format. First, the null byte (byte)0 is encoded using the 2-byte
> format rather than the 1-byte format, so that Java virtual machine
> UTF-8 strings never have embedded nulls. Second, only the 1-byte,
> 2-byte, and 3-byte formats are used. The Java virtual machine does not
> recognize the longer UTF-8 formats.
In that case, the manual should state that references to UTF-8 are to the
Java dialect meaning rather than the standard Unicode meaning. And there
still shouldn't be references to "UTF", unqualified, as here - if it means
some form of UTF-8, it should say so.
Does Java define that, except for the special encoding of the null byte,
over-long sequences must be treated as invalid, to avoid the usual
security holes associated with them?
--
Joseph S. Myers
jsm28@cam.ac.uk