This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Question on GCJ/Boehm Memory Utilization, Part II


Hans Boehm wrote:

As Martin pointed out, adjusting GC_free_space_divisor should help.

That aside, I still don't understand this 100%. The large block allocator
does keep separate free lists for separate block sizes. (This changed a
while ago, but it has now done so for quite a long time.) Thus it should
not break up the 16K blocks used to hold 12932 byte objects until it has
used up the smaller ones. If this is not happening, there's a bug here.
(GC_dump() prints the large block free lists for each size, so this should
be fairly easy to check.)


If this is working correctly, garbage collections are simply being
triggered too late.  A GC_free_space_divisor increase should compensate.
(I think that 7.0 will behave better in this respect, once it's
integerated into gcj.)

Hans

On Tue, 21 Feb 2006, Craig A. Vanderborgh wrote:



Hello Hans, and thanks for taking the time to think about this. We finally figured out - thanks to the Avtrex tool - what was going on. Once again, it turns out that the Boehm garbage collector was working perfectly all along, and that we had an "application problem".

Only this was no ordinary application problem, but really a GCJ "feature". Please take a moment and check this out because it's completely amazing how misleading and evil it is. Our Java parser class has this kind of thing in it for accumulating XML tokens:

class Parser {
static  StringBuffer  buffer = new StringBuffer;

String  method1 {
  buffer.setLength(0);
 <accumulate String in buffer>
 return buffer.toString()
}
.
.
.
}

A nice, efficient little approach that even reuses the StringBuffer so that you don't have to allocate a new StringBuffer each different parser method, right?

Actually, it turns out that reusing the StringBuffer is terrible thing to do, and the cause of the whole problem! It's a classic case of what you see is NOT what you get. First, consider the ensure_capacity method in StringBuffer:

private void ensureCapacity_unsynchronized (int minimumCapacity)
{
  if (shared || minimumCapacity > value.length)
    {
  // We don't want to make a larger vector when `shared' is
  // set.  If we do, then setLength becomes very inefficient
  // when repeatedly reusing a StringBuffer in a loop.
  int max = (minimumCapacity > value.length
         ? value.length*2+2
         : value.length);
  minimumCapacity = (minimumCapacity < max ? max : minimumCapacity);
  char[] nb = new char[minimumCapacity];
  System.arraycopy(value, 0, nb, 0, count);
  value = nb;
  shared = false;
    }
}

This method is used by StringBuffer to create/expand the String chars when more space is needed OR when the chars become "shared". Read on...

And then toString() of StringBuffer:

public String toString ()
{
  // Note: in libgcj this causes the StringBuffer to be shared.  In
  // Classpath it does not.
  return new String (this);
}

Because of the java.lang.String implementation in GCJ, the "new String(this)" does not actually copy the StringBuffer chars, but "shares" them with the String until ensureCapacity is called. What goes wrong is this - we have some large CDATA sections in our XML pages that cause the StringBuffer size to grow. Once a StringBuffer is shared, ensureCapacity creates a new character allocation for itself, but uses the LARGEST SIZE NEEDED THUS FAR to create the StringBuffer chars every time a new String chars allocation is done. This - in effect - is a factory for uselessly large, application-killing String allocations.

Boehm GC is doing nothing more or less than what libgcj and the application ask. Large blocks are being allocated because StringBuffer.ensureCapacity asks for them. They stay referenced because the application keeps around DOM trees, which should be okay because the String allocations are supposed to be small.

But they're not. A totally and evil and subtle problem. And without the Daney/Avtrex tool, we might never have figured this out. This problem has been haunting us for years. And, now that it's fixed, difference this makes is immense. All you have to do is to NOT REUSE StringBuffers in the parser code (sketched at the top of this email) and the problem entirely vanishes.

The Boehm heap on our embedded application is now less than half the size it was before this find. And the reason is obvious. Now, the parsed token strings are usually 88 characters, instead of 12932. This, as one would expect, greatly affects the behavior of the collector in the positive direction. The reason I'm reporting this in so much detail is that I'm hoping I can help others who have this problem save a few years of their time.

Finally, people (like Ranjit) have complained about the "signature" line in my postings that contains several sentences of legal mumbo jumbo. My intrepid IT department affixes these after I send email and I have no control that. So I'm sorry, but there's nothing I can do about it.

Thanks to everybody - ONE MILLION TIMES. It is truly wonderful to have found and fixed this memory management problem.

Regards,
craig vanderborgh
voxware incorporated

Hello Again!

First of all, thanks for the constructive replies to my previous
"installment" from David and Rutger.

I have done a whole lot more testing and "dump diving" since then, and I
have at least determined what is happening with our GCJ/Boehm
"leaking".  There is no leaking - or at least no leaking that I am
concerned about in what I can see.

I believe the correct word for what is happening is "fragmentation".
Hans also calls it "unexpectedly large heap growth".  Maybe this latter
term is the most appropriate.  What is happening is pretty clear:

1. For processing a large-ish DOM element containing text, the
application requests and receives a (say) 12932 block from GC.
2. This large-ish block is used to do the job, and then dereferenced.
3. Garbage collection happens, and the 12932-byte block is put back on
the free list
4. The application requests a small block for a short string, and the
collector decides to allocate the unused 12932 byte block, even though
it's way too big.
5. Goto (1)

I have done enough logging and Daney-dumping to know that this how the
heap becomes large.  It does asymoptotically stabilize at a large value,
but the value is too large for us to live with.

And so - what to do?  Is it possible to instruct Boehm not to allocate
block sizes above a certain threshold for short string char[]'s?  Should
the large-ish blocks be unmapped, so that this can't happen?  Should I
write some special-purpose classes that make use of CNI to use malloc()
allocations so that these large-ish (12932) blocks are not requested
from Boehm GC?

I also want to point out that I do have a "control" test now.  On a test
that does not have JS tags <script> with large texts in them, but is
exactly the same otherwise, I get a steady-state heapsize of 1.5 MB
instead of 13 MB, with all other things being equal.  This shows that if
my application does not ever request the 12932 blocks, things really
work quite well with GCJ/Boehm allocation.

Please, if possible, help me reach a clue about how to proceed in the
most constructive direction.  This would be greatly appreciated.

Thanks in advance,
craig vanderborgh
voxware incorporated





Confidentiality Note: This message may contain information which is privileged or confidential and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee responsible for delivering the message to the intended recipient, you are hereby NOTIFIED that any dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you received this email in error, please notify Voxware immediately by return email to the sender.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]