This is the mail archive of the
java-patches@gcc.gnu.org
mailing list for the Java project.
Patch: Fix for PR 2319
- To: Gcc Patch List <gcc-patches at gcc dot gnu dot org>
- Subject: Patch: Fix for PR 2319
- From: Tom Tromey <tromey at redhat dot com>
- Date: 19 Jun 2001 14:12:29 -0600
- Cc: Java Patch List <java-patches at gcc dot gnu dot org>
- Reply-To: tromey at redhat dot com
This patch fixes PR 2319. With this, we will now get an error if the
built-in UTF-8 decoder is used and it sees an invalid or overlong
sequence.
Note that on systems with a working iconv() this decoder isn't used,
even if the "UTF-8" encoding is requested. This means we're still at
the mercy of the system in some ways. The UTF-8 decoder in the glibc
I'm using does not flag these things as errors :-(
Ok to commit?
2001-06-19 Tom Tromey <tromey@redhat.com>
* lex.c (java_read_char): Disallow invalid and overlong
sequences. Fixes PR java/2319.
Tom
Index: lex.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/java/lex.c,v
retrieving revision 1.65
diff -u -r1.65 lex.c
--- lex.c 2001/05/04 00:34:48 1.65
+++ lex.c 2001/06/19 19:53:15
@@ -454,15 +454,21 @@
if (c == EOF)
return UEOF;
if (c < 128)
- return (unicode_t)c;
+ return (unicode_t) c;
else
{
if ((c & 0xe0) == 0xc0)
{
c1 = getc (lex->finput);
if ((c1 & 0xc0) == 0x80)
- return (unicode_t)(((c &0x1f) << 6) + (c1 & 0x3f));
- c = c1;
+ {
+ unicode_t r = (unicode_t)(((c & 0x1f) << 6) + (c1 & 0x3f));
+ /* Check for valid 2-byte characters. We explicitly
+ allow \0 because this encoding is common in the
+ Java world. */
+ if (r == 0 || (r >= 0x80 && r <= 0x7ff))
+ return r;
+ }
}
else if ((c & 0xf0) == 0xe0)
{
@@ -471,16 +477,23 @@
{
c2 = getc (lex->finput);
if ((c2 & 0xc0) == 0x80)
- return (unicode_t)(((c & 0xf) << 12) +
- (( c1 & 0x3f) << 6) + (c2 & 0x3f));
- else
- c = c2;
+ {
+ unicode_t r = (unicode_t)(((c & 0xf) << 12) +
+ (( c1 & 0x3f) << 6)
+ + (c2 & 0x3f));
+ /* Check for valid 3-byte characters.
+ Don't allow surrogate, \ufffe or \uffff. */
+ if (r >= 0x800 && r <= 0xffff
+ && ! (r >= 0xd800 && r <= 0xdfff)
+ && r != 0xfffe && r != 0xffff)
+ return r;
+ }
}
- else
- c = c1;
}
- /* We simply don't support invalid characters. */
+ /* We simply don't support invalid characters. We also
+ don't support 4-, 5-, or 6-byte UTF-8 sequences, as these
+ cannot be valid Java characters. */
java_lex_error ("malformed UTF-8 character", 0);
}
}