This is the mail archive of the java-patches@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Another 646 patch and performance comparison with TOWER


Hi Tom,
thanks for your reply.

On Fri, Mar 29, 2002 at 01:27:03PM -0700, Tom Tromey wrote:
> >>>>> "Martin" == Martin Kahlert <martin.kahlert@infineon.com> writes:
> 
> Martin> I would like to see this patch in 3.1.
> Martin> It replaces the infamous Solaris encoding "646"
> Martin> by "8859_1" in DEFAULT_FILE_ENCODING.
> 
> I'd like to understand the problem further.
Here are some more details:
I read some bytes from a file into a byte array called line and then do 
new String ( line, 0, length );

Without my patch it works along these lines (java/lang/String.java):

public String (byte[] byteArray, int offset, int count)
  {
    try
      {
        init (byteArray, offset, count,
              System.getProperty("file.encoding", "8859_1"));
              ^^^ this call gives "646" -- i already dream about that.
      }
    catch (UnsupportedEncodingException x1)
      {
        // Maybe the default encoding is bad.
        try
          {
            init (byteArray, offset, count, "8859_1");
            ^^^ we will end up here.
          }
        catch (UnsupportedEncodingException x2)
          {
            // We know this can't happen.
          }
      }
  }

Inside init (java/lang/natString.cc) we get into BytesToUnicode::getDefaultDecode
And after the system did not succeed with any useful encoding, an
UnsupportedEncodingException is thrown which is handled by substituting 646
by 8859_1 (see the code above these lines).

> Ok.  We definitely need to do something.  A 40x slowdown is
> unacceptable.
Good, we agree about that :-)

> Martin> The unpatched version tried to open a lot of non-existing libs
> Martin> causing the immense slowdown
> 
> This is interesting.  It suggests that it would make sense for us to
> cache the name->encoding-class lookup.  That ought to reduce the
> number of bad lookups to just one (or one set) per run.  This will
> help all platforms and encodings, not just 646 on Solaris.
Yes, it would -- when 646 was a senseful encoding. Unfortunately,
only an UnsupportedEncodingException is thrown and your nice new code
isn't even executed.

> Could you try this patch?  Be sure, of course, to back out your
> current patch before applying this one.

I tried it but the patch seems to be buggy:

1. in BytesToUnicode.java: 
+    Class decodingClass = (Class) nameMap.get (encoding);
+    if (decodingClass == null)
+      {
+	try
+	  {
+	    decodingClass = Class.forName(className);
+	    return (BytesToUnicode) decodingClass.newInstance();
            ^^^^
            this must be result = (BytesToUnicode) decodingClass.newInstance();
            otherwise you won't get to the nameMap[encoding] = decodingClass
            part of your patch at all.
+	  }
+	catch (Throwable ex)
+	  {
+	    // We didn't find the class, or construction failed.  Try
+	    // iconv.
+	    decodingClass = Input_iconv.class;

2. Your patch would only work, if the bloody 646 encoding would result in a
   reasonable decodingClass. Unfortunately it does not and only throws an
   UnsupportedEncodingException over and over again.

The best idea for me is to treat 646 either as invalid (which it is in fact)
or substitute it by 8859_1 (which is the default for file_encoding in natSystem.cc.

Any new ideas?
 
Kind regards,
Martin.

-- 
The early bird catches the worm. If you want something else for       
breakfast, get up later.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]