UTF-16 not supported?

Tom Tromey tromey@redhat.com
Fri Aug 16 14:40:00 GMT 2002


>>>>> "Suresh" == Suresh Raman <sugansha@yahoo.com> writes:

Suresh> The output of the program should be "hello world", which it is with
Suresh> UTF-8.  But with UTF-16 or UTF-16BE, the output is a truncated string
Suresh> "hell" or "hello".

The appended patch fixes your test case for me.  It also doesn't cause
any regressions on our test suite (including Mauve).

Does anybody out there have a box with glibc 2.1.3?  I'd like to know
if you could run a test to see how this behaves there.

How common is 2.1.3?  Is there a distribution still using it?  (Even a
somewhat old distribution, if it is still in common use.)  If it is
really obsolete then I can just remove all pretense at a workaround...

Tom

Index: ChangeLog
from  Tom Tromey  <tromey@redhat.com>

	* gnu/gcj/convert/natIconv.cc (write): Handle case where no
	output buffer is too small.

Index: gnu/gcj/convert/natIconv.cc
===================================================================
RCS file: /cvs/gcc/gcc/libjava/gnu/gcj/convert/natIconv.cc,v
retrieving revision 1.13
diff -u -r1.13 natIconv.cc
--- gnu/gcj/convert/natIconv.cc 18 Feb 2002 02:52:44 -0000 1.13
+++ gnu/gcj/convert/natIconv.cc 16 Aug 2002 21:37:39 -0000
@@ -1,6 +1,6 @@
-// Input_iconv.java -- Java side of iconv() reader.
+// natIconv.cc -- Java side of iconv() reader.
 
-/* Copyright (C) 2000, 2001  Free Software Foundation
+/* Copyright (C) 2000, 2001, 2002  Free Software Foundation
 
    This file is part of libgcj.
 
@@ -201,25 +201,39 @@
       inbuf = (char *) temp_buffer;
     }
 
-  // If the conversion fails on the very first character, then we
-  // assume that the character can't be represented in the output
-  // encoding.  There's nothing useful we can do here, so we simply
-  // omit that character.  Note that we can't check `errno' because
-  // glibc 2.1.3 doesn't set it correctly.  We could check it if we
-  // really needed to, but we'd have to disable support for 2.1.3.
   size_t loop_old_in = old_in;
   while (1)
     {
       size_t r = iconv_adapter (iconv, (iconv_t) handle,
 				&inbuf, &inavail,
 				&outbuf, &outavail);
-      if (r == (size_t) -1 && inavail == loop_old_in)
+      if (r == (size_t) -1)
 	{
-	  inavail -= 2;
-	  if (inavail == 0)
-	    break;
-	  loop_old_in -= 2;
-	  inbuf += 2;
+	  if (errno == EINVAL)
+	    {
+	      // Incomplete byte sequence at the end of the input
+	      // buffer.  This shouldn't be able to happen here.
+	      break;
+	    }
+	  else if (errno == E2BIG)
+	    {
+	      // Output buffer is too small.
+	      break;
+	    }
+	  else if (errno == EILSEQ || inavail == loop_old_in)
+	    {
+	      // Untranslatable sequence.  Since glibc 2.1.3 doesn't
+	      // properly set errno, we also assume that this is what
+	      // is happening if no conversions took place.  (This can
+	      // be a bogus assumption if in fact the output buffer is
+	      // too small.)  We skip the first character and try
+	      // again.
+	      inavail -= 2;
+	      if (inavail == 0)
+		break;
+	      loop_old_in -= 2;
+	      inbuf += 2;
+	    }
 	}
       else
 	break;



More information about the Java mailing list