This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
Debugging "Leaks" With Boehm-GC
- From: "Craig A. Vanderborgh" <craigv at voxware dot com>
- To: java at gcc dot gnu dot org
- Date: Sun, 20 Nov 2005 14:47:24 -0500
- Subject: Debugging "Leaks" With Boehm-GC
Hello Everyone:
We are currently experiencing some dire difficulties with one of GCJ's
most opaque aspects - garbage collection.
The application exhibiting "leaks" is an XML browser of our own
creation. The use case that "leaks" is one where our browser software
works its way through interpreting some page elements, and then throws
an "event", which is implemented by throwing a Java exception, which
something up the stack elements then catches. The software works
beautifully, except it "leaks".
I am using quotes around "leaks" because I do not think this problem is
a garbage collection fault. I believe it's a problem caused by our
application. The test case that shows the "leaking" is a VXML browser
test which repeatedly preprocesses then interprets the same cached
page. What we can see with GC_STDOUT/GC_PRINT_STATS enabled is that
Boehm GC is unable to garbage-collect classes from our own browser
implementation that it SHOULD BE ABLE to collect. The inescapable
conclusion is that our Java XML browser implementation, although
straightforward, well-written, clean, etc. is able somehow to "fake out"
Boehm GC, making it think that class instances are still in use although
they actually are not. We have inferred through GC statistics that
these browser classes are accumulating and are not deallocated, because
the GC heap grows without bound, as does the time that GC takes to
"mark". We believe these boundless increases of space/time for GC means
that it's encountering CRAPLOADS of our browser class instances that it
tries to mark for deletion but determines it cannot, traversing these
"baby bird" objects over and over again, as they continue to multiply.
This problem is giving me fits because both sides of the equation are so
opaque. On one side, you have our reasonably complicated VXML browser
implementation that has allocation patterns that are rather hard to
understand at all. Then, on the other side you have Boehm GC, which is
truly opaque, a sort of "glorified malloc without a free". I am caught
in the middle, seemingly without anything to look at.
I am open to any suggestions from the GCJ Elite, but it seems like what
I need to do next is to come up with at least an INKLING of what is
actually being leaked. The easiest way to do that, it would seem, would
be to have GCJ tell me what objects it can't deallocate, after initial
startup stuff is done.
Is there some way to do this? I need to know everything there is to
know about "GCJ leak detection", and I need to know it yesterday.
Someone - anyone - please give me some clues about what to learn and how
to approach this. This problem is truly killing us, and I am going to
have to move aggresively to fix it. Documentation, Boehm GC debugging
tips, slaps in the face, anything at all would be appreciated.
I should mention that we've tried GCJ 3.3, 3.3.2, 3.3.6, and 4.0.2 on
arm-wince-pe, ARM-Linux and X86 Linux platforms, and the GC behavior
running this failing testcase is IDENTICAL across all platforms/versions.
In closing, let me reiterate: I do not think that this is a "GC
problem". I think the only way we'll ever fix this is by making our
application run in such a way that it does not create objects that Boehm
thinks are ineligible for collection. What I need for the moment is
just some ideas on how to go about debugging this kind of problem.
Thanks a million - in advance.
Best Regards,
craig vanderborgh
voxware incorporated