java/2313: Java SimpleDateFormat crash with non US locales (french...)

Tom Tromey tromey@redhat.com
Mon Mar 19 08:36:00 GMT 2001


The following reply was made to PR java/2313; it has been noted by GNATS.

From: Tom Tromey <tromey@redhat.com>
To: Bryce McKinlay <bryce@albatross.co.nz>
Cc: diam@ensta.fr, gcc-gnats@gcc.gnu.org, java-patches@gcc.gnu.org
Subject: Re: java/2313: Java SimpleDateFormat crash with non US locales   (french...)
Date: 19 Mar 2001 09:42:35 -0700

 >>>>> "Bryce" == Bryce McKinlay <bryce@albatross.co.nz> writes:
 
 Bryce>     System.out.println ("Liberté, égalité, fraternité !");
 
 Bryce> works fine in the default mode, but with "--encoding=UTF-8" it
 Bryce> produces incorrect output.
 
 That's because the input file isn't actually in UTF-8, but it also
 doesn't contain an incorrect (by our rules -- see below) UTF-8
 sequence that would let us see it as erroneous.
 
 The `é' is 0xe9.  This is a valid start byte for a 2-byte UTF-8
 sequence.  That is why the following character is also removed.
 
 We ought to be noticing that the subsequent bytes in the sequence are
 invalid.  That is what Unicode specifies, and there probably isn't a
 good reason to allow incorrectly encoded characters.  However the code
 wasn't originally written this way and I never updated it to do this.
 I'll submit a PR.
 
 Bryce> Unfortunately, I know very little about character
 Bryce> encoding. Maybe Tom can suggest a fix or workaround. Perhaps
 Bryce> its possible to do something to convert the file to a UTF-8
 Bryce> encoding before trying to compile it?
 
 One fix would be to tell gcj the real encoding of the file:
 
     gcj --encoding=8859_1 ...
 
 This works for me.  However, note that the encoding names are
 system-dependent :-(.  Ideally we'd have a table of aliases mapping
 the Java-specified names to the system-dependent ones.
 
 Another fix would be to use the `iconv' or `recode' programs to
 convert the file into UTF-8 before compiling.  This is a pain to do,
 but might be the only recourse on systems with a losing (or no)
 iconv() implementation.
 
 Tom



More information about the Gcc-prs mailing list