This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

About the encoding of libgcj?


Hi, all,

As we know, the libbcj library is only for the utf-8 compatible locale . That is to say , it assume all the machine is using utf-8 compatible locale. I am wonderring why libgcj doesn't support other locale, such GB2312, probably using the iconv to convert between them. for example, if my locale is XXX, and the names of files in my machine are not compatible with utf-8. and then when I get the file list of my directory , I will get a NullPointerException.(of cource , I have compiled the program to binary using gcj). I looked into the code of libgcj ,and finally find this codes
==================
libjava/java/io/natFilePosix.cc
--------------------------------
the jobjectArray java::io::File::(java::io::FilenameFilter *filter ,java::io::FileFilter *fileFilter,
java::lang::Class *result_type) method:
while (readdir_r (dir, (struct dirent *) dbuf, &d) == 0 && d != NULL)
#else /* HAVE_READDIR_R */
while ((d = readdir (dir)) != NULL)
#endif /* HAVE_READDIR_R */
{
// Omit "." and "..".
if (d->d_name[0] == '.'
&& (d->d_name[1] == '\0'
|| (d->d_name[1] == '.' && d->d_name[2] == '\0')))
continue;
jstring name = JvNewStringUTF (d->d_name); // because the encoding of d->d_name is XXX,which is not compatible with utf-8, The name is NULL
if (filter && ! filter->accept(this, name))
continue;


if (result_type == &java::io::File::class$)
{
java::io::File *file = new java::io::File (this, name); //This cause an NullPointerException as the name is NULL
if (fileFilter && ! fileFilter->accept(file))
continue;


     list->add(file);
   }
     else
   list->add(name);
   }

closedir (dir);
=============
As we see, the reason is caused by the JvNewStringUTF method ,which is located at gcj/cni.h. It calls _Jv_NewStringUTF in java/lang/natString.cc.
The _Jv_NewStringUTF method converts a utf-8 char[] to a jstring. but the problem is the encoding of the the char[](d->d_name) is determined by the machine's locale, As I said below, which is not compatible with utf-8.
So the _Jv_strLengthUtf8 ((char *) p, size) alrways return -1,that is to say, the jstring returned always returns NULL. why shouldn't we convent the
d-d_name char[] to a utf-8 char[] probably using iconv ? I have tried this, and it works fine.
==============
jstring _Jv_NewStringUTF (const char *bytes)
{
int size = strlen (bytes);
unsigned char *p = (unsigned char *) bytes;


int length = _Jv_strLengthUtf8 ((char *) p, size); // the length is alway -1.
if (length < 0)
return NULL;
jstring jstr = JvAllocString (length);
jchar *chrs = JvGetStringChars (jstr);


 p = (unsigned char *) bytes;
 unsigned char *limit = p + size;
 while (p < limit)
   *chrs++ = UTF8_GET (p, limit);

return jstr; }
==============
prims.cc :
---------
int
_Jv_strLengthUtf8(char* str, int len)
{
unsigned char* ptr;
unsigned char* limit;
int str_length;


ptr = (unsigned char*) str;
limit = ptr + len;
str_length = 0;
for (; ptr < limit; str_length++)
{
if (UTF8_GET (ptr, limit) < 0)
return (-1);
}
return (str_length);
}
=============
include/jvm.h
------------------
/* Extract a character from a Java-style Utf8 string.
* PTR points to the current character.
* LIMIT points to the end of the Utf8 string.
* PTR is incremented to point after the character thta gets returns.
* On an error, -1 is returned. */
#define UTF8_GET(PTR, LIMIT) \
((PTR) >= (LIMIT) ? -1 \
: *(PTR) < 128 ? *(PTR)++ \
: (*(PTR)&0xE0) == 0xC0 && ((PTR)+=2)<=(LIMIT) && ((PTR)[-1]&0xC0) == 0x80 \
? (((PTR)[-2] & 0x1F) << 6) + ((PTR)[-1] & 0x3F) \
: (*(PTR) & 0xF0) == 0xE0 && ((PTR) += 3) <= (LIMIT) \
&& ((PTR)[-2] & 0xC0) == 0x80 && ((PTR)[-1] & 0xC0) == 0x80 \
? (((PTR)[-3]&0x0F) << 12) + (((PTR)[-2]&0x3F) << 6) + ((PTR)[-1]&0x3F) \
: ((PTR)++, -1))
=================
The similar problems exist in may places, for example the File.isFile() , which is finally invoke a _stat(jint query).
=============
java/io/natFilePosix.cc
----------
java::io::File::_stat (jint query)
{
if (query == ISHIDDEN)
return getName()->charAt(0) == '.';


#ifdef HAVE_STAT
 char *buf = (char *) __builtin_alloca (JvGetStringUTFLength (path) + 1);
 jsize total = JvGetStringUTFRegion (path, 0, path->length(), buf);
 buf[total] = '\0';

struct stat sb;
if (::stat (buf, &sb)) // the char[] buf is encoded by utf-8 by defaul, and of cause it can not work fine on my XXX-locale machine, my suggest is
// converting the char[] buf to a new char[] whose encoding is compatible with my locale.
return false;


JvAssert (query == DIRECTORY || query == ISFILE);
jboolean r = S_ISDIR (sb.st_mode);
return query == DIRECTORY ? r : ! r;
#else
return false;
#endif
}
=========
So , What I want to say is why libgcj does not provoide a method to implement the convertions between utf-8 and other encoding on the layer
between libgcj and OS , probably outside the jvm?


May I know your thoughts.Thank you.

   Best Regards,
   jimmy


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]