This is the mail archive of the java-patches@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: RFC: caching for I/O converters


Per Bothner <per@bothner.com> writes:

I wrote the patch in question.  It was a quick hack just to do a test.
It was quite surprising how much garbage it avoided generating.
Dealing with around 2MB of strings generated 20MB of garbage decoding
tables.

> Tom Tromey <tromey@redhat.com> writes:
> 
> > This patch adds caching for I/O (charset) converters.  A customer
> > reported to us that creating a new converter in some String methods
> > causes a lot of garbage to be created if you do a lot of converting
> > from bytes to characters.
> > 
> > Any comments on the advisability of this?  Or comments on this patch
> > in particular?  I'm not sure what the performance impact might be.  It
> > was important to them, but I don't know if their case is very common.
> 
> This may be overkill.  Does the decodingTable Hashtable really gain you
> anything over a single static vector of REUSE_VECTOR_SIZE?  When you
> need a BytesToUnicode instance, just search through that vector for a
> match.

This is probably a better solution than the Hashtable.  I would just
use an array, in fact.  I was thinking originally that there would be
lots of different types (I didn't understand how these were all used
when I started), but there probably won't be.

> 
> Or even go further:  Just keep a single cache instance.  If the
> needed conversion matches the cached instance, use it, otherwise
> allocate a new one.  This assumes you're unlikely to switch very
> frequently between encodes, which I think is a fairly safe assumption.

Still, an array would be pretty easy to implement.  And if you had a
couple of threads doing this, it might still throw a lot of garbage,
though I guess it would still be unlikely.

> 
> An even better solution might be if we can use a BytesToUnicode
> concurrently, which means without state.  For example:
> 
> long convert(byte[] inbuffer, long instate, in length,
>              char[] outbuffer, int outstart, int outlength)
> 
> Convert bytes in INBUFFER, leaving result in outbuffer starting at
> OUTSTART.  The INSTATE is normally just the starting offset in
> inbuffer (in which case the high-order 32 bits is 0), but it can
> also include the encoding state returned from a previous call.
> We can return in one of two (non-error) ways:
> (1) We ran out of space in the output buffer, in which case the
> result is an INSTATE for the following call to convert.
> (2) We used up all the input bytes.  Result is:
> (1L<<63)|(STATE << 31)|WRITTEN where WRITTEN is the number chars
> written to OUTBUFFER and STATE is the state for the next convert.
> 
> We can implement the read routine using convert:
> 
> int state = 0;
> public int read (char[] outbuffer, int outpos, int count)
> {
>   long result = convert(this.inbuffer, (state<<31)|this.inpos, this.inlength,
>                         outbuffer, outpos, count);
>   state = (int) (result >> 31);
>   if (result < 0)
>     {
>       this.inpos = this.inlength;
>       return result & 0x7fffffff;
>     }
>   else
>     {
>       this.inpos = result & 0x7fffffff;
>       return count;
>     }
> }
> 

I don't understand how you would tell it which decoder to use.  A
static converter wouldn't work for that, but a method on each decoder
that did a one-shot conversion would probably work.  You would still
have to allocate one of each decoder you used to do this, though.  And
you would have to modify all the decoders to add the method.  I still
think the array would be best.

I can do a new patch, if you like.

-Corey


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]