This is the mail archive of the
java-patches@gcc.gnu.org
mailing list for the Java project.
Re: RFC: caching for I/O converters
- To: tromey at redhat dot com
- Subject: Re: RFC: caching for I/O converters
- From: Per Bothner <per at bothner dot com>
- Date: 29 May 2001 23:07:17 -0700
- Cc: Java Patch List <java-patches at gcc dot gnu dot org>
- References: <87snhn7lir.fsf@creche.redhat.com>
Tom Tromey <tromey@redhat.com> writes:
> This patch adds caching for I/O (charset) converters. A customer
> reported to us that creating a new converter in some String methods
> causes a lot of garbage to be created if you do a lot of converting
> from bytes to characters.
>
> Any comments on the advisability of this? Or comments on this patch
> in particular? I'm not sure what the performance impact might be. It
> was important to them, but I don't know if their case is very common.
This may be overkill. Does the decodingTable Hashtable really gain you
anything over a single static vector of REUSE_VECTOR_SIZE? When you
need a BytesToUnicode instance, just search through that vector for a
match.
Or even go further: Just keep a single cache instance. If the
needed conversion matches the cached instance, use it, otherwise
allocate a new one. This assumes you're unlikely to switch very
frequently between encodes, which I think is a fairly safe assumption.
Using a Hashtable risks keeping extra garbage around.
An even better solution might be if we can use a BytesToUnicode
concurrently, which means without state. For example:
long convert(byte[] inbuffer, long instate, in length,
char[] outbuffer, int outstart, int outlength)
Convert bytes in INBUFFER, leaving result in outbuffer starting at
OUTSTART. The INSTATE is normally just the starting offset in
inbuffer (in which case the high-order 32 bits is 0), but it can
also include the encoding state returned from a previous call.
We can return in one of two (non-error) ways:
(1) We ran out of space in the output buffer, in which case the
result is an INSTATE for the following call to convert.
(2) We used up all the input bytes. Result is:
(1L<<63)|(STATE << 31)|WRITTEN where WRITTEN is the number chars
written to OUTBUFFER and STATE is the state for the next convert.
We can implement the read routine using convert:
int state = 0;
public int read (char[] outbuffer, int outpos, int count)
{
long result = convert(this.inbuffer, (state<<31)|this.inpos, this.inlength,
outbuffer, outpos, count);
state = (int) (result >> 31);
if (result < 0)
{
this.inpos = this.inlength;
return result & 0x7fffffff;
}
else
{
this.inpos = result & 0x7fffffff;
return count;
}
}
>
> 2001-05-29 Tom Tromey <tromey@redhat.com>
>
> * gnu/gcj/convert/UnicodeToBytes.java (defaultEncodingClass):
> Removed.
> (defaultEncoding, encodingTable): New static fields.
> (getDefaultEncodingClass): Removed.
> (getDefaultEncoder): Use getEncoder.
> (done): New method.
> * gnu/gcj/convert/IOConverter.java (REUSE_VECTOR_SIZE): New
> constant.
> * gnu/gcj/convert/BytesToUnicode.java (defaultDecodingClass):
> Removed.
> (defaultEncoding): New static field.
> (decodingTable): Likewise.
> (getDefaultDecodingClass): Removed.
> (getDefaultDecoder): Use getDecoder.
> (getDecoder): Look up decoder in hash table.
> (done): New method.
>
> Tom
>
> Index: gnu/gcj/convert/BytesToUnicode.java
> ===================================================================
> RCS file: /cvs/gcc/gcc/libjava/gnu/gcj/convert/BytesToUnicode.java,v
> retrieving revision 1.8
> diff -u -r1.8 BytesToUnicode.java
> --- BytesToUnicode.java 2000/09/11 00:35:51 1.8
> +++ BytesToUnicode.java 2001/05/29 23:44:35
> @@ -1,4 +1,4 @@
> -/* Copyright (C) 1999, 2000 Free Software Foundation
> +/* Copyright (C) 1999, 2000, 2001 Free Software Foundation
>
> This file is part of libgcj.
>
> @@ -8,6 +8,8 @@
>
> package gnu.gcj.convert;
>
> +import java.util.Hashtable;
> +
> public abstract class BytesToUnicode extends IOConverter
> {
> /** Buffer to read bytes from.
> @@ -18,27 +20,12 @@
> /** End of valid bytes in buffer. */
> public int inlength;
>
> - static Class defaultDecodingClass;
> + // The name of the default encoding.
> + static String defaultEncoding;
>
> - static synchronized void getDefaultDecodingClass()
> - {
> - // Test (defaultDecodingClass == null) again in case of race condition.
> - if (defaultDecodingClass == null)
> - {
> - String encoding = canonicalize (System.getProperty("file.encoding"));
> - String className = "gnu.gcj.convert.Input_"+encoding;
> - try
> - {
> - defaultDecodingClass = Class.forName(className);
> - }
> - catch (ClassNotFoundException ex)
> - {
> - throw new NoClassDefFoundError("missing default encoding "
> - + encoding + " (class "
> - + className + " not found)");
> - }
> - }
> - }
> + // This maps a decoding name to an array of decoders of that type.
> + // This lets us reuse decoders.
> + static Hashtable decodingTable = new Hashtable ();
>
> public abstract String getName();
>
> @@ -46,20 +33,33 @@
> {
> try
> {
> - if (defaultDecodingClass == null)
> - getDefaultDecodingClass();
> - return (BytesToUnicode) defaultDecodingClass.newInstance();
> + synchronized (BytesToUnicode.class)
> + {
> + if (defaultEncoding == null)
> + {
> + String encoding
> + = canonicalize (System.getProperty("file.encoding",
> + "8859_1"));
> + String className = "gnu.gcj.convert.Input_" + encoding;
> + try
> + {
> + Class defaultDecodingClass = Class.forName(className);
> + defaultEncoding = encoding;
> + }
> + catch (ClassNotFoundException ex)
> + {
> + throw new NoClassDefFoundError("missing default encoding "
> + + encoding + " (class "
> + + className
> + + " not found)");
> + }
> + }
> + }
> + return getDecoder (defaultEncoding);
> }
> catch (Throwable ex)
> {
> - try
> - {
> - return new Input_iconv (System.getProperty ("file.encoding"));
> - }
> - catch (Throwable ex2)
> - {
> - return new Input_8859_1();
> - }
> + return new Input_8859_1();
> }
> }
>
> @@ -67,6 +67,23 @@
> public static BytesToUnicode getDecoder (String encoding)
> throws java.io.UnsupportedEncodingException
> {
> + BytesToUnicode[] vec = (BytesToUnicode[]) decodingTable.get (encoding);
> + if (vec != null)
> + {
> + synchronized (BytesToUnicode.class)
> + {
> + for (int i = 0; i < vec.length; ++i)
> + {
> + if (vec[i] != null)
> + {
> + BytesToUnicode r = vec[i];
> + vec[i] = null;
> + return r;
> + }
> + }
> + }
> + }
> +
> String className = "gnu.gcj.convert.Input_" + canonicalize (encoding);
> Class decodingClass;
> try
> @@ -120,4 +137,27 @@
> * of the length parameter for a read request).
> */
> public abstract int read (char[] outbuffer, int outpos, int count);
> +
> + /** Indicate that the converter is resuable.
> + * This class keeps track of converters on a per-encoding basis.
> + * When done with an encoder you may call this method to indicate
> + * that it can be reused later.
> + */
> + public final void done ()
> + {
> + synchronized (BytesToUnicode.class)
> + {
> + String name = getName ();
> + BytesToUnicode[] vec = (BytesToUnicode[]) decodingTable.get (name);
> + if (vec == null)
> + {
> + vec = new BytesToUnicode[REUSE_VECTOR_SIZE];
> + decodingTable.put (name, vec);
> + }
> +
> + for (int i = 0; i < vec.length; ++i)
> + if (vec[i] == null)
> + vec[i] = this;
> + }
> + }
> }
> Index: gnu/gcj/convert/IOConverter.java
> ===================================================================
> RCS file: /cvs/gcc/gcc/libjava/gnu/gcj/convert/IOConverter.java,v
> retrieving revision 1.2
> diff -u -r1.2 IOConverter.java
> --- IOConverter.java 2000/11/01 17:00:01 1.2
> +++ IOConverter.java 2001/05/29 23:44:35
> @@ -1,4 +1,4 @@
> -/* Copyright (C) 2000 Free Software Foundation
> +/* Copyright (C) 2000, 2001 Free Software Foundation
>
> This file is part of libgcj.
>
> @@ -21,6 +21,11 @@
> // True if we have to do byte-order conversions on iconv()
> // arguments.
> static protected boolean iconv_byte_swap;
> +
> + // We keep a hash table for all the re-usable I/O converters that
> + // are created. We keep a vector of such converters for each
> + // encoding. This is the size of that vector.
> + static final int REUSE_VECTOR_SIZE = 10;
>
> static
> {
> Index: gnu/gcj/convert/UnicodeToBytes.java
> ===================================================================
> RCS file: /cvs/gcc/gcc/libjava/gnu/gcj/convert/UnicodeToBytes.java,v
> retrieving revision 1.7
> diff -u -r1.7 UnicodeToBytes.java
> --- UnicodeToBytes.java 2000/09/11 00:35:51 1.7
> +++ UnicodeToBytes.java 2001/05/29 23:44:35
> @@ -1,4 +1,4 @@
> -/* Copyright (C) 1999, 2000 Free Software Foundation
> +/* Copyright (C) 1999, 2000, 2001 Free Software Foundation
>
> This file is part of libgcj.
>
> @@ -7,7 +7,9 @@
> details. */
>
> package gnu.gcj.convert;
> -
> +
> +import java.util.Hashtable;
> +
> public abstract class UnicodeToBytes extends IOConverter
> {
> /** Buffer to emit bytes to.
> @@ -15,28 +17,12 @@
> public byte[] buf;
> public int count;
>
> - static Class defaultEncodingClass;
> + // The name of the default encoding.
> + static String defaultEncoding;
>
> - static synchronized void getDefaultEncodingClass()
> - {
> - // Test (defaultEncodingClass == null) again in case of race condition.
> - if (defaultEncodingClass == null)
> - {
> - String encoding = canonicalize (System.getProperty("file.encoding"));
> - String className = "gnu.gcj.convert.Output_"+encoding;
> - try
> - {
> - defaultEncodingClass = Class.forName(className);
> - }
> - catch (ClassNotFoundException ex)
> - {
> - throw new NoClassDefFoundError("missing default encoding "
> - + encoding + " (class "
> - + className + " not found)");
> -
> - }
> - }
> - }
> + // This maps a decoding name to an array of encoders of that type.
> + // This lets us reuse encoders.
> + static Hashtable encodingTable = new Hashtable ();
>
> public abstract String getName();
>
> @@ -44,20 +30,34 @@
> {
> try
> {
> - if (defaultEncodingClass == null)
> - getDefaultEncodingClass();
> - return (UnicodeToBytes) defaultEncodingClass.newInstance();
> + synchronized (UnicodeToBytes.class)
> + {
> + if (defaultEncoding == null)
> + {
> + String encoding
> + = canonicalize (System.getProperty("file.encoding",
> + "8859_1"));
> + String className = "gnu.gcj.convert.Output_" + encoding;
> + try
> + {
> + Class defaultEncodingClass = Class.forName(className);
> + defaultEncoding = encoding;
> + }
> + catch (ClassNotFoundException ex)
> + {
> + throw new NoClassDefFoundError("missing default encoding "
> + + encoding + " (class "
> + + className
> + + " not found)");
> + }
> + }
> + }
> +
> + return getEncoder (defaultEncoding);
> }
> catch (Throwable ex)
> {
> - try
> - {
> - return new Output_iconv (System.getProperty ("file.encoding"));
> - }
> - catch (Throwable ex2)
> - {
> - return new Output_8859_1();
> - }
> + return new Output_8859_1();
> }
> }
>
> @@ -65,6 +65,23 @@
> public static UnicodeToBytes getEncoder (String encoding)
> throws java.io.UnsupportedEncodingException
> {
> + UnicodeToBytes[] vec = (UnicodeToBytes[]) encodingTable.get (encoding);
> + if (vec != null)
> + {
> + synchronized (UnicodeToBytes.class)
> + {
> + for (int i = 0; i < vec.length; ++i)
> + {
> + if (vec[i] != null)
> + {
> + UnicodeToBytes r = vec[i];
> + vec[i] = null;
> + return r;
> + }
> + }
> + }
> + }
> +
> String className = "gnu.gcj.convert.Output_" + canonicalize (encoding);
> Class encodingClass;
> try
> @@ -121,5 +138,28 @@
> int srcEnd = inpos + (inlength > work.length ? work.length : inlength);
> str.getChars(inpos, srcEnd, work, 0);
> return write(work, inpos, inlength);
> + }
> +
> + /** Indicate that the converter is resuable.
> + * This class keeps track of converters on a per-encoding basis.
> + * When done with an encoder you may call this method to indicate
> + * that it can be reused later.
> + */
> + public final void done ()
> + {
> + synchronized (UnicodeToBytes.class)
> + {
> + String name = getName ();
> + UnicodeToBytes[] vec = (UnicodeToBytes[]) encodingTable.get (name);
> + if (vec == null)
> + {
> + vec = new UnicodeToBytes[REUSE_VECTOR_SIZE];
> + encodingTable.put (name, vec);
> + }
> +
> + for (int i = 0; i < vec.length; ++i)
> + if (vec[i] == null)
> + vec[i] = this;
> + }
> }
> }
> Index: java/lang/natString.cc
> ===================================================================
> RCS file: /cvs/gcc/gcc/libjava/java/lang/natString.cc,v
> retrieving revision 1.23
> diff -u -r1.23 natString.cc
> --- natString.cc 2001/05/24 18:06:03 1.23
> +++ natString.cc 2001/05/29 23:44:36
> @@ -523,6 +523,7 @@
> avail -= done;
> }
> }
> + converter->done ();
> this->data = array;
> this->boffset = (char *) elements (array) - (char *) array;
> this->count = outpos;
> @@ -604,6 +605,7 @@
> todo -= converted;
> }
> }
> + converter->done ();
> if (bufpos == buflen)
> return buffer;
> jbyteArray result = JvNewByteArray(bufpos);
>
--
--Per Bothner
per@bothner.com http://www.bothner.com/per/