This is the mail archive of the java-discuss@sourceware.cygnus.com mailing list for the Java project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: inlining and string concatenation

To: tromey at cygnus dot com
Subject: Re: inlining and string concatenation
From: Bryce McKinlay <bryce at albatross dot co dot nz>
Date: Sat, 29 Apr 2000 14:44:11 +1200
CC: Java Discuss List <java-discuss at sourceware dot cygnus dot com>
References: <87u2glsmnk.fsf@cygnus.com>

Tom Tromey wrote:

> Today I spent some time looking at how g++ does tree-level inlining.
> It turns out to be pretty simple, I think.  It wouldn't take too much
> to do this for the .java part of gcj (the .class part would be harder
> because it doesn not represent entire functions as trees yet).
>
> It occurred to me that we could use this tree-based inlining, in
> conjunction with a small change to the front end, to make string
> concatenation much faster without having to introduce a new
> unsynchronized StringBuffer-like class into the runtime.

That does sound cool, but I think using a StringBuffer-like class would
still be desirable, because the implementation can be made more efficient
in other ways, apart from just the synchronization issue.

For example, given:

String s = "foo" + somestring + "bar";

Here the compiler creates a StringBuffer object with "foo" as the initial
parameter, then calls append() on it twice for the two other strings.
Once done, the compiler calls toString() (once) on the stringbuffer and
then lets it become collectable.

There is significant overhead associated with the stringbuffer having to
check its capacity on each append(), then allocate a new larger char
array and copy over its data. Its wasteful to allocate more memory than
it needs more often than it needs. Furthermore, when toString() is
called, an optimization is made to reuse the stringbuffer's array as the
data pointer in the new string. This avoids the overhead of another
memcpy(), but means that the string can contain wasted bytes because the
data array is likely to be longer than the String's length.

I think this could be made a lot more efficient. We allways know at
compile time how many strings will need to be concatenated (although we
don't usually know the length of all those strings). The compiler never
calls insert() or reverse() on a stringbuffer. We never need to append()
chars or other stringbuffers, only Strings.

Given these assumptions (which may not all be correct, I'm just
guessing), why not make a native StringBuffer equivilant that the
compiler calls with an argument of the total number of strings being
concatenated. This "FastStringBuffer" would then allocate an array of
pointers of the precice length required. When the compiler calls
toString(), it would add up the length fields of all the strings
appended, allocate a new string of the precise size required, and
memcpy() all the data. It only ever has to do 2 (3?) allocations, only
the exact number of memcpy()'s required, and doesn't waste any memory.

regards

  [ bryce ]

Follow-Ups:
- Re: inlining and string concatenation
  - From: Per Bothner

References:
- inlining and string concatenation
  - From: Tom Tromey

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]