UTF-16 not supported?
Tom Tromey
tromey@redhat.com
Fri Aug 16 14:40:00 GMT 2002
>>>>> "Suresh" == Suresh Raman <sugansha@yahoo.com> writes:
Suresh> The output of the program should be "hello world", which it is with
Suresh> UTF-8. But with UTF-16 or UTF-16BE, the output is a truncated string
Suresh> "hell" or "hello".
The appended patch fixes your test case for me. It also doesn't cause
any regressions on our test suite (including Mauve).
Does anybody out there have a box with glibc 2.1.3? I'd like to know
if you could run a test to see how this behaves there.
How common is 2.1.3? Is there a distribution still using it? (Even a
somewhat old distribution, if it is still in common use.) If it is
really obsolete then I can just remove all pretense at a workaround...
Tom
Index: ChangeLog
from Tom Tromey <tromey@redhat.com>
* gnu/gcj/convert/natIconv.cc (write): Handle case where no
output buffer is too small.
Index: gnu/gcj/convert/natIconv.cc
===================================================================
RCS file: /cvs/gcc/gcc/libjava/gnu/gcj/convert/natIconv.cc,v
retrieving revision 1.13
diff -u -r1.13 natIconv.cc
--- gnu/gcj/convert/natIconv.cc 18 Feb 2002 02:52:44 -0000 1.13
+++ gnu/gcj/convert/natIconv.cc 16 Aug 2002 21:37:39 -0000
@@ -1,6 +1,6 @@
-// Input_iconv.java -- Java side of iconv() reader.
+// natIconv.cc -- Java side of iconv() reader.
-/* Copyright (C) 2000, 2001 Free Software Foundation
+/* Copyright (C) 2000, 2001, 2002 Free Software Foundation
This file is part of libgcj.
@@ -201,25 +201,39 @@
inbuf = (char *) temp_buffer;
}
- // If the conversion fails on the very first character, then we
- // assume that the character can't be represented in the output
- // encoding. There's nothing useful we can do here, so we simply
- // omit that character. Note that we can't check `errno' because
- // glibc 2.1.3 doesn't set it correctly. We could check it if we
- // really needed to, but we'd have to disable support for 2.1.3.
size_t loop_old_in = old_in;
while (1)
{
size_t r = iconv_adapter (iconv, (iconv_t) handle,
&inbuf, &inavail,
&outbuf, &outavail);
- if (r == (size_t) -1 && inavail == loop_old_in)
+ if (r == (size_t) -1)
{
- inavail -= 2;
- if (inavail == 0)
- break;
- loop_old_in -= 2;
- inbuf += 2;
+ if (errno == EINVAL)
+ {
+ // Incomplete byte sequence at the end of the input
+ // buffer. This shouldn't be able to happen here.
+ break;
+ }
+ else if (errno == E2BIG)
+ {
+ // Output buffer is too small.
+ break;
+ }
+ else if (errno == EILSEQ || inavail == loop_old_in)
+ {
+ // Untranslatable sequence. Since glibc 2.1.3 doesn't
+ // properly set errno, we also assume that this is what
+ // is happening if no conversions took place. (This can
+ // be a bogus assumption if in fact the output buffer is
+ // too small.) We skip the first character and try
+ // again.
+ inavail -= 2;
+ if (inavail == 0)
+ break;
+ loop_old_in -= 2;
+ inbuf += 2;
+ }
}
else
break;
More information about the Java
mailing list