Mohan Build 22.5.03

fernando@lozano.eti.br fernando@lozano.eti.br
Tue May 27 22:41:00 GMT 2003


Hi there,

Maybe the crew (Ranjit, Mohan, Adam, ...) has not noticed but non-ascii file
names under Windows is a dirty subject. On Windows95/98/ME all file names are
encoded using some code page (defaults to 850 for most european languages and
437 for english) although the content of text files is encoded using Latin-1
a.k.a ISO-8859-1 (actually a few symbols are different, so the Windows codepage
1200 or something like that).

Windows NT/2000/XP uses unicode (at least that's what the manual says, I never
confirmed) but may use codepage 850 over the wire when talking to LANMAN
clients (Windows98, smbfs). Actually many of the Win32 API has two versions,
one using unicode and the other using Latin-1 or some other encoding for
japanase and arabic Windows.

Java by the spec uses unicode all the time internally, so libcj should do
different conversions depending on the OS it's running. I simply never use
accents on file names, this breaks many apps "Designed for Windows". You may
not notice anything unless you copy the files to another machine, but I know
many C and Java developers used to accent file names and identificators.


[]s, Fernando Lozano

> The problem has nothing to do with the drive you are using. And has
> nothing to do with Mohan's build (although I have not tried the last one).
>
> The only way I can find to reproduce the problem is to put a file in the
> working directory (you are not accessing the root directory -- you would
> have to use c:\\ for that...) with a file name which includes a
> character with a code above 127 (joão.txt will do the trick).
>
> I doubt that anyone using an english keyboard will ever report this
> problem... ;-)
>
> If you would take a look at the file natFileWin32.cc (libjava sources)
> you would understand the reason: libjava is using UTF-8 encoding to
> convert chars from 16 bits to 8 bits...
>
> Theoretically I love UTF-8! But it is almost useless with Win32 file
> systems for char codes above 127. :(
>
> I have a patch that fixes this problem. It supports all win32 code
> pages, with the following exceptions (I don't know how to work with
> these code pages):
> Chinese (traditional and simplified);
> Japanese;
> Korean.
>
> It is not fully tested (I wouldn't be able to test it effectively on all
> the code pages...).
>
> I am willing to post it on the patches mailing list (I was already
> planing to do so). But there is no warranty that it will be accepted...
> Can you build your own version of the compiler?
>
> Maybe Mohan will be kind enough to include the patch in the next
> "Mohan's build" (tm). ;-)
>
> João
>
>



More information about the Java mailing list