This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: GCJ manual changed


On Wed, 30 Jan 2002, Per Bothner wrote:

> The VM specification does talk about "UTF-8 strings". Also:
> 
>    There are two differences between this format and the "standard" UTF-8
>    format. First, the null byte (byte)0  is encoded using the 2-byte
>    format rather than the 1-byte format, so that Java virtual machine
>    UTF-8 strings never have embedded nulls. Second, only the 1-byte,
>    2-byte, and 3-byte formats are used. The Java virtual machine does not
>    recognize the longer UTF-8 formats.

In that case, the manual should state that references to UTF-8 are to the 
Java dialect meaning rather than the standard Unicode meaning.  And there 
still shouldn't be references to "UTF", unqualified, as here - if it means 
some form of UTF-8, it should say so.

Does Java define that, except for the special encoding of the null byte,
over-long sequences must be treated as invalid, to avoid the usual
security holes associated with them?

-- 
Joseph S. Myers
jsm28@cam.ac.uk


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]