Partial fix for libgcj/9802

Mark Wielaard mark@klomp.org
Sat Feb 22 18:03:00 GMT 2003


Hi,

The following is a partial fix for libgcj/9802 (Bug in surrogate
handling in Unicode to UTF-8  conversion). This only fixes the case for
UTF-8 surrogates but as James Clark explains this can also occur in
other multibyte encodings.

In principle the other encoders can also be rewritten to use the new
bytes_todo field to indicate that more output is available. But I am
hoping that converting the encoders to the new java.nio.charset
framework will eliminate this problem since it has explicit support for
this (see CoderResult, Jesse Rosenstock will certainly correct me if I
am wrong). But I do not expect that we can finish that work for 3.3, so
just fixing it now for UTF-8 seems worthwhile.

I also added the testcase that James Clark made to Mauve and it passes
with this patch. Since non of the other encoders use the bytes_todo
field this does not impact any other encoders, just UTF-8.

2002-02-22  Mark Wielaard  <mark@klomp.org>

        Partial fix for PR libgcj/8738:
        * gnu/gcj/convert/UnicodeToBytes.java (bytes_todo): New field.
        (done): Reset bytes_todo field.
        * gnu/gcj/convert/Output_UTF8.java (bytes_todo): Removed field.
        (write): Always decrease avail when count is increased.
        * java/lang/natString.cc (getByes): Check converter->bytes_todo.

OK for branch and mainline?

Cheers,

Mark
-------------- next part --------------
A non-text attachment was scrubbed...
Name: convert.patch
Type: text/x-patch
Size: 2852 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/java-patches/attachments/20030222/b0a6bddd/attachment.bin>


More information about the Java-patches mailing list