GC leaks debugging

Erik Groeneveld erik@cq2.nl
Tue Apr 12 18:43:00 GMT 2011


Hans, Andrew,

Having concluded for myself that fragmentation causes the heap to grow
indefinitely, it tried to find a workaround.  Because changing
different (environment-, build-, runtime-) variables didn't help, I
started looking at the code itself.

I found that all memory allocation calls from GCJ eventually come down
to GC_allochblk(), so I started gathering some statistics about it.
It turned out that it wasn't called that often at all, so I just added
a forced collect to see if my assumptions were right, risking much
slower runtime of course.  I tried:

@@ -50,6 +52,13 @@
     /* Do our share of marking work */
         if(GC_incremental && !GC_dont_gc)
            GC_collect_a_little_inner((int)n_blocks);
+
+    if (n_blocks >= 8) { // 32 kB and bigger often occur in fragmented heaps
+           GC_gcollect_inner();
+           printf(">>> forced collect <<<\n");
+    }
+
     h = GC_allochblk(lw, k, flags);
 #   ifdef USE_MUNMAP
        if (0 == h) {

I ran my test, and ignored it slowness (only noticing that it was not
so much slower).  But it works:

Before: 29,000,000 docs, 820 MB heap, OOM.
After: 67,000,000 docs, 490 MB heap.  Disk full ;-(

So frequent collection can certainly avoid fragmentation in this case.

Now the most curious of all: it is even faster as before:

Before: 1306 docs/second
After: 1582 docs/second

Apparently, it is better to collect a small heap more often than a
large heap less often.

Now this hack helped me to assert my assumptions, but it also works
well enough that I am going to try it to relieve some of the stress
that has been plaguing some productions systems for quite some time
now.

Meanwhile, I'd like to pursue a better solution - less of a hack.  Any
interest in helping out?

Erik



More information about the Java mailing list