This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: GCJ manual changed
- From: Bryce McKinlay <bryce at waitaki dot otago dot ac dot nz>
- To: "Joseph S. Myers" <jsm28 at cam dot ac dot uk>
- Cc: Tom Tromey <tromey at redhat dot com>, Per Bothner <per at bothner dot com>, Nic Ferrier <nferrier at tapsellferrier dot co dot uk>, java-patches at gcc dot gnu dot org, gcc-patches at gcc dot gnu dot org
- Date: Thu, 31 Jan 2002 15:11:13 +1300
- Subject: Re: GCJ manual changed
- References: <Pine.LNX.4.33.0201310102420.22659-100000@kern.srcf.societies.cam.ac.uk>
Joseph S. Myers wrote:
>On 30 Jan 2002, Tom Tromey wrote:
>
>>I don't recall seeing text to that effect in anything I've read. And
>>I'd be willing to bet that at least some versions of the JDK from Sun
>>don't reject such sequences. For that matter, we don't reject such
>>sequences. It's unclear whether we should change our implementation
>>here; this is yet another under-specified aspect of Java.
>>
>
>We ought to reject them (unless it is specifically specified otherwise).
>Both the Unicode and ISO 10646 standards were changed to disallow
>interpretation (not just generation) of such sequences as representing the
>characters they would appear to represent when a naive UTF-8 decoder is
>used, because of the security issues associated with multiple
>representations.
>
>If there is some way of influencing Java standards, it would be worthwhile
>to represent that the standards should be changed to make it clear such
>over-long sequences must be rejected.
>
The online docs say:
*Unicode 3.0 Support*
Character handling in J2SE 1.4 is based on version 3.0 of the
Unicode standard. This affects the Character and String classes in
the java.lang package as well as the collation and bidirectional
text analysis functionality in the java.text package.
So, if the Unicode standard has something to say on the matter, then
that is what we should implement.
regards
Bryce.