Bug 29420 - OutOfMemory in image handling
Summary: OutOfMemory in image handling
Status: RESOLVED FIXED
Alias: None
Product: classpath
Classification: Unclassified
Component: awt (show other bugs)
Version: 0.92
: P3 major
Target Milestone: ---
Assignee: Francis Kung
URL:
Keywords:
Depends on: 28769
Blocks: 28751
  Show dependency treegraph
 
Reported: 2006-10-10 19:38 UTC by Roman Kennke
Modified: 2006-11-13 08:19 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-11-02 21:12:40


Attachments
suppress segfaults, throw java exceptions instead (452 bytes, patch)
2006-11-06 21:08 UTC, Francis Kung
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Kennke 2006-10-10 19:38:42 UTC
+++ This bug was initially created as a clone of Bug #28769 +++

+++ This bug was initially created as a clone of Bug #28719 +++

According to a comment in Bug #28719, loading a couple of 800x600 images can cause an OutOfMemory.
Comment 1 Francis Kung 2006-10-12 20:44:53 UTC
How are these images being loaded?  I can load a 1600x1200 image in niffler, or at least three 1600x1200's in a test image (using Toolkit.getDefaultToolkit().getImage(), then drawing them on screen), without any problems...

A patch went in recently that cleaned up a bit of image stuff; can you retest?
Comment 2 Francis Kung 2006-10-19 21:37:35 UTC
Is this related to 29510 ?  That is also a bug with severe image-processing performance problems since the last release.
Comment 3 Roman Kennke 2006-10-23 14:07:45 UTC
Francis checked in some fixes to the BufferedImage handling that are likely related to this bug. Norman, could you retest if the OOM still occurs?
Comment 4 Norman Hendrich 2006-10-23 14:42:59 UTC
(In reply to comment #3)
> Francis checked in some fixes to the BufferedImage handling that are likely
> related to this bug. Norman, could you retest if the OOM still occurs?

I last tested on 2006.10.20, after Francis checked in the *TYPE_RGB fix.
Image-loading speed was back to normal (highly appreciated!), but running 
a slideshow still runs out of memory rather soon.

Admittedly, the OOME was triggered after loading a few hundred images
(about 1024x768) with jamvm and -Xmx300m. Loading just three images,
as reported and tested by Francis, should indeed work. (In niffler,
select a directory, menu > slideshow > linear (or random), and set
the interval to a small value. System load goes to 100 percent, but
it shouldn't take long to trigger a OOME).

Unfortunately, once the OOME is triggered, stacktraces are not very helpful,
and I don't know how to analyze the (Java/GTK) heaps for garbage.

Did I miss fixes after 2006.10.20, so that I should re-test?


Comment 5 Francis Kung 2006-10-23 15:13:12 UTC
No, nothing since the 20'th.  I'm assuming these images should not be in memory concurrently (slideshow loads one, then moves on to the next, and the original image *should* be freed) - so a memory leak somewhere?
Comment 6 Norman Hendrich 2006-10-23 17:24:41 UTC
(In reply to comment #5)
> No, nothing since the 20'th.  I'm assuming these images should not be in memory
> concurrently (slideshow loads one, then moves on to the next, and the original
> image *should* be freed) - so a memory leak somewhere?


Yes. Either a leak or memory corruption, probably in the JNI/GTK parts.
(A while ago, after I last reported similar OOME problems, Mark fixed 
several leaks in the GTK code. But I have no idea whether and where 
current cvs might trigger those anew.)

The image viewer really only keeps one reference to the current image,
and a temporary reference to the image it is next loading. I also had
the "thumbnails panel" disabled for these tests; but the slideshow
wait interval was so low that the JVM was continually loading new images.
(When thumbnails generation and display is on, niffler also keeps a cache
of the thumbnail images. But the thumbnails are really small, and the
cache is cleared when memory runs low.)


With cp cvs 2006.10.20, slideshows usually survive at least 200 images 
(jamvm with -Xmx300m), but the VM seems to get very unstable beyond that. 
Most often, the VM just crashes with a segfault, but here is one of the 
OOME stacktraces I got:

java.lang.OutOfMemoryError
   at gnu.java.awt.peer.gtk.GtkImage.getPixels(Native Method)
   at gnu.java.awt.peer.gtk.CairoSurface.<init>(CairoSurface.java:181)
   at gnu.java.awt.peer.gtk.CairoSurface.getBufferedImage(CairoSurface.java:261)
   at gnu.java.awt.peer.gtk.GtkToolkit.createImage(GtkToolkit.java:255)
   at java.awt.Toolkit.createImage(Toolkit.java:661)
   at niffler.ImageUtils.loadImageViaToolkit(ImageUtils.java:253)
   at niffler.ImageUtils.loadImageViaToolkit(ImageUtils.java:211)
   at niffler.Niffler.loadImage(Niffler.java:1622)
   ...
   at niffler.SlideshowManager.run(SlideshowManager.java:82)

Comment 7 Francis Kung 2006-10-24 20:41:51 UTC
I can confirm a leak, though I'm finding it more difficult to re-create.

Using jamvm -Xmx80m, niffler-exif v1.0.1, and ~1mb JPEGs, average memory was increasing at about 1 mb per 75 images loaded (from watching top), but it still hadn't crashed after a few hundred pictures.. maybe the pictures I'm using are smaller (I don't have any larger pictures available to test with right now).

Using -Xmx60m causes a segfault early in the program (at native/jni/gtk-peer/gnu_java_awt_peer_gtk_CairoSurface.c:140).
Comment 8 Francis Kung 2006-11-02 21:12:40 UTC
There are at least two problems here, one native memory leak and one Java leak.

I've found the Java leak (in JTree), and that should be fixed soon...

There is, however, a second memory leak that comes up when you mouse-over the image, which I haven't found yet.  I suspect it's in the native code somewhere.
Comment 9 Norman Hendrich 2006-11-03 15:01:11 UTC
Subject: Re:  OutOfMemory in image handling


> ------- Comment #8 from fkung at redhat dot com  2006-11-02 21:12 -------
> There are at least two problems here, one native memory leak and one  
>  Java leak.
>
> I've found the Java leak (in JTree), and that should be fixed soon...

good catch!   I was so fixed on image loading that the idea of a leak in
JTree didn't even occur to me...


> There is, however, a second memory leak that comes up when you mouse-over the
> image, which I haven't found yet.  I suspect it's in the native code  
>  somewhere.

Hmm. When testing the slide-show, I didn't move the mouse at all (or very
little). Why do you suspect that moving the mouse leaks memory? How should
I test this?

- Norman






Comment 10 Francis Kung 2006-11-03 20:16:59 UTC
I've been testing with two methods... using Runtime.getRuntime().freeMemory() and .totalMemory() to watch the java heap, and "ps v (pid)" to watch total resident memory.

With 'ps v', resident memory increases with mouse movement (and doesn't go back down), so it looks like a native leak somewhere... 
Comment 11 Francis Kung 2006-11-06 12:51:27 UTC
The mouse-over leak (if it's a leak at all) happens when testing on Sun too, so it doesn't look like a Classpath bug.  Since it deals with native memory, it's not likely a Niffler bug; perhaps it's an issue in GDK.

I'll try to get a fix in for the JTree problem this morning, then can you retest?  The native leak shouldn't cause an OutOfMemoryError, so that wasn't the original problem anyways.
Comment 12 Norman Hendrich 2006-11-06 13:26:41 UTC
(In reply to comment #11)
> The mouse-over leak (if it's a leak at all) happens when testing on Sun too, 
> [...]

Hmmm. So far, I haven't reproduced this on neither classpath nor Sun... :-)


> I'll try to get a fix in for the JTree problem this morning, then can you
> retest?  

Great.

Comment 13 Francis Kung 2006-11-06 16:49:12 UTC
>> The mouse-over leak (if it's a leak at all) happens when testing on Sun too, 
>> [...]
>
>Hmmm. So far, I haven't reproduced this on neither classpath nor Sun... :-)

I'll assume it's not an issue then ;-)  And definitely not classpath-related.

>> I'll try to get a fix in for the JTree problem this morning, then can you
>> retest?  
>
>Great.

OK, it's in... not the ideal fix, but a temporary fix at least so that we aren't shipping with a memory leak (in case we can't do a proper fix by the next release).
Comment 14 cvs-commit@developer.classpath.org 2006-11-06 16:51:00 UTC
Subject: Bug 29420

CVSROOT:	/cvsroot/classpath
Module name:	classpath
Changes by:	Francis Kung <fkung>	06/11/06 16:48:08

Modified files:
	javax/swing    : JTree.java 
	.              : ChangeLog 

Log message:
	2006-11-06  Roman Kennke  <kennke@aicas.com>
	
		* javax/swing/JTree.java
		(TreeModelHandler.treeNodesRemoved): Implemented.
		(TreeModelHandler.treeStructureChanged): Implemented.
		(nodeStates): Made package private.
	
	2006-11-06  Francis Kung  <fkung@redhat.com>
	
		PR 29420
		* javax/swing/JTree.java
		(clearSelectionPathStates): New private method to clean up nodeStates.
		(removeSelectionPath): Call clearSelectionPathStates().
		(removeSelectionPaths): Call clearSelectionPathStates().
		(removeSelectionRow): Call clearSelectionPathStates().
		(setSelectionPath): Call clearSelectionPathStates().
		(setSelectionPaths): Call clearSelectionPathStates().
		(setSelectionRow): Call clearSelectionPathStates().

CVSWeb URLs:
http://cvs.savannah.gnu.org/viewcvs/classpath/javax/swing/JTree.java?cvsroot=classpath&r1=1.77&r2=1.78
http://cvs.savannah.gnu.org/viewcvs/classpath/ChangeLog?cvsroot=classpath&r1=1.8760&r2=1.8761



Comment 15 Norman Hendrich 2006-11-06 20:23:42 UTC
(In reply to comment #13)

> OK, it's in... not the ideal fix, but a temporary fix at least so that we
> aren't shipping with a memory leak (in case we can't do a proper fix by the
> next release).

Tested current cvs right now. Running a slide show in the background
(with delay of 750 msecs between images, at cpu load of <= 50%) on jamvm
-Xmx300m still crashed with SEGV after 397 medium-sized images... 
Memory usage as reported by top quickly exceeded 300m RSS, and then
stayed in the 250..350M range for some time. The X server reports 
another 150 MB. I only have 512M real, with 1G swap.

I ran this test with ctwm as my windowmanager, with Niffler being 
_invisible_ on one of the inactive virtual screens. (My previous testing 
was with actual repaints on the currently visible virtual screen.)

Are you sure that the leak is fixed? If so, it may be yet another 
glitch with ctwm. Will give it another try tomorrow.
Comment 16 Norman Hendrich 2006-11-06 20:55:38 UTC
Here is my attempt at a stacktrace. Running within gdb somehow makes the
application use somewhat less RSS memory, but it still crashed after
475 images:

gdb /opt/jamvm/bin/jamvm
run -Xmx300m niffler.Niffler <dirname>
[starting linear slideshow]
[slideshow interval 750 msec]
[thumbnails generation off, histogram off]
[normal image loading messages]
...
-I- getCairoBackedImage: took 7 msecs. for 488x749

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 442396 (LWP 18483)]
0x536d3173 in Java_gnu_java_awt_peer_gtk_CairoSurface_nativeGetPixels (
    env=0x806201c, obj=0x50620da8, bufferPointer=The value of variable 'bufferPo
inter' is distributed across several
locations, and GDB cannot access its value.

)
    at gnu_java_awt_peer_gtk_CairoSurface.c:138
138       memcpy (jpixdata, pixeldata, size * sizeof( jint ));

[ (gdb) thread 29 ]
(gdb) where
#0  0x536d3173 in Java_gnu_java_awt_peer_gtk_CairoSurface_nativeGetPixels (
    env=0x806201c, obj=0x50620da8, bufferPointer=The value of variable 'bufferPo
inter' is distributed across several
locations, and GDB cannot access its value.

)
    at gnu_java_awt_peer_gtk_CairoSurface.c:138
#1  0x0805eca2 in callJNIMethod (env=0x806201c, class=0x0, 
    sig=0x853d788 "(JI)[I", ret_type=4, ostack=0x8b34574, 
    f=0x536d30f0 "U\211å\203ì8\213E\020\213U\024\211]ôè\001þÿÿ\201ÃôB\002", 
    args=4) at dll_md.c:85
#2  0x0804f7a1 in callJNIWrapper (class=0x0, mb=0x8541780, ostack=0xfffefcff)
    at dll.c:322
#3  0x08052847 in executeJava () at interp.c:2229
#4  0x08050447 in executeMethodVaList (ob=Variable "ob" is not available.
) at execute.c:91
#5  0x080504e9 in executeMethodArgs (ob=0xfffefcff, class=0xfffefcff, 
    mb=0xfffefcff) at execute.c:63
#6  0x0805c441 in threadStart (arg=0x8b17d48) at thread.c:263
#7  0x40067c60 in pthread_start_thread () from /lib/libpthread.so.0
#8  0x40067cdf in pthread_start_thread_event () from /lib/libpthread.so.0
#9  0x40193b77 in clone () from /lib/libc.so.6




(gdb) thread 6
[Switching to thread 6 (Thread 65541 (LWP 18446))]#0  0x4006e576 in nanosleep
    () from /lib/libpthread.so.0
(gdb) where
#0  0x4006e576 in nanosleep () from /lib/libpthread.so.0
#1  0x00000001 in ?? ()
#2  0x4006a2b1 in __pthread_timedsuspend_new () from /lib/libpthread.so.0
#3  0x40066f33 in pthread_cond_timedwait_relative () from /lib/libpthread.so.0
#4  0x0805716b in monitorWait (mon=0x8063440, self=0x8075a20, ms=The value of variable 'ms' is distributed across several
locations, and GDB cannot access its value.

)
    at lock.c:180
#5  0x0805b965 in threadSleep (thread=0x8075a20, ms=The value of variable 'ms' is distributed across several
locations, and GDB cannot access its value.

) at thread.c:143
#6  0x0804b97d in asyncGCThreadLoop (self=0x8075a20) at alloc.c:1130
#7  0x0805bd20 in shell (args=0x8075768) at thread.c:470
#8  0x40067c60 in pthread_start_thread () from /lib/libpthread.so.0
#9  0x40067cdf in pthread_start_thread_event () from /lib/libpthread.so.0
#10 0x40193b77 in clone () from /lib/libc.so.6





(gdb) thread 10
[Switching to thread 10 (Thread 131081 (LWP 18451))]#0  0x4018b5e7 in poll ()
   from /lib/libc.so.6
(gdb) where
#0  0x4018b5e7 in poll () from /lib/libc.so.6
#1  0x53c309ac in g_main_context_iterate (context=0x8234678, block=1, 
    dispatch=1, self=0x851b7b0) at gmain.c:2867
#2  0x53c30e87 in IA__g_main_loop_run (loop=0x850a2c8) at gmain.c:2769
#3  0x53830411 in IA__gtk_main () at gtkmain.c:1003
#4  0x536e77bd in Java_gnu_java_awt_peer_gtk_GtkToolkit_gtkMain (
    env=0x806201c, obj=0x4050dc40) at gnu_java_awt_peer_gtk_GtkToolkit.c:338
#5  0x0805ecd5 in callJNIMethod (env=0x806201c, class=0x4050dc40, 
    sig=0x8089ce0 "()V", ret_type=0, ostack=0x850b72c, 
    f=0x536e7770 "U\211åSè\214·þÿ\201Ã\177ü", args=1) at dll_md.c:66
#6  0x0804f7a1 in callJNIWrapper (class=0x4050dc40, mb=0x81f0684, 
    ostack=0xfffffffc) at dll.c:322
#7  0x08052847 in executeJava () at interp.c:2229
#8  0x08050447 in executeMethodVaList (ob=Variable "ob" is not available.
) at execute.c:91
#9  0x080504e9 in executeMethodArgs (ob=0xfffffffc, class=0xfffffffc, 
    mb=0xfffffffc) at execute.c:63
#10 0x0805c441 in threadStart (arg=0x850b680) at thread.c:263
#11 0x40067c60 in pthread_start_thread () from /lib/libpthread.so.0
#12 0x40067cdf in pthread_start_thread_event () from /lib/libpthread.so.0
#13 0x40193b77 in clone () from /lib/libc.so.6

Hope that helps...
Comment 17 Francis Kung 2006-11-06 21:08:43 UTC
Created attachment 12554 [details]
suppress segfaults, throw java exceptions instead

Yes, the segfaults occur when we try to allocate a Java array from native code... we don't always check to make sure the array was allocated successfully, so it dies somewhat ungracefully.  If you apply the attached patch, it should throw OutOfMemoryError's instead of segfaulting.

Which all tells me that it's a Java leak, not a native leak, since we're running out of heap space.  Memory usage seems pretty steady right now from my debug logs (I can't trigger the OOME any more), but I'll run some more extensive tests on it again.
Comment 18 Norman Hendrich 2006-11-07 13:35:02 UTC
(In reply to comment #17)
> Created an attachment (id=12554) [edit]
> suppress segfaults, throw java exceptions instead

OK. I applied your patch, and indeed I now get the following:

java.lang.OutOfMemoryError
   at gnu.java.awt.peer.gtk.GtkImage.getPixels(Native Method)
   at gnu.java.awt.peer.gtk.CairoSurface.<init>(CairoSurface.java:181)
   at gnu.java.awt.peer.gtk.CairoSurface.getBufferedImage(CairoSurface.java:261)
   at gnu.java.awt.peer.gtk.GtkToolkit.createImage(GtkToolkit.java:255)
   at java.awt.Toolkit.createImage(Toolkit.java:661)
   at niffler.ImageUtils.loadImageViaToolkit(ImageUtils.java:253)
   at niffler.ImageUtils.loadImageViaToolkit(ImageUtils.java:211)
   at niffler.Niffler.loadImage(Niffler.java:1623)
   at niffler.Niffler.skipTinyImages(Niffler.java:2106)
   at niffler.Niffler.doShowNextImageButAvoidThumbnails(Niffler.java:2042)
   at niffler.SlideshowManager.run(SlideshowManager.java:82)

or:

java.lang.OutOfMemoryError
   at gnu.java.awt.peer.gtk.CairoSurface.nativeGetPixels(Native Method)
   at gnu.java.awt.peer.gtk.CairoSurface.getPixels(CairoSurface.java:134)
   at gnu.java.awt.peer.gtk.BufferedImageGraphics.updateBufferedImage(BufferedImageGraphics.java:191)
   at gnu.java.awt.peer.gtk.BufferedImageGraphics.drawImage(BufferedImageGraphics.java:317)
   at gnu.java.awt.peer.gtk.CairoGraphics2D.drawImage(CairoGraphics2D.java:1445)
   at niffler.ImageUtils.getBufferedImageRGB(ImageUtils.java:397)
   at niffler.ImageUtils.getBufferedImage(ImageUtils.java:371)
   at niffler.Niffler.loadImage(Niffler.java:1630)
   at niffler.Niffler.skipTinyImages(Niffler.java:2108)
   at niffler.Niffler.doShowNextImageButAvoidThumbnails(Niffler.java:2044)
   at niffler.SlideshowManager.run(SlideshowManager.java:82)

To track this down, I tried torturing the tree (selecting nodes and
repainting) without loading images. This test didn't seem to leak
any memory (top showing jamvm going up to about 60M, and then staying
there).


I then changed Niffler to don't update the tree anymore during a slideshow:
just uncomment updateSelectedTreePath() in updateGUI(). As far as I can see,
this means that the JTree is neither modified nor repainted during a slideshow,
but I get the above OOME after ~400 medium-sized images (~1024x768)
at -Xmx300m and after ~170 images at -Xmx150m. In other words, about 230 
images per 150m of memory, or almost 1M per image. The images themselves
are about 100..200 KB each.

Running the slideshow on 3000x2000 JPGs with -Xmx300m dies with OOME
after ~220 images. 

It would be interesting to simply select "Edit -> Load via ImageIO" in
Niffler, to test for leaks in Toolkit.createImage( byte[] ) vs ImageIO.
Unfortunately this is so incredibly slow as to be impractical
(at least on my system).









Comment 19 Norman Hendrich 2006-11-07 18:23:01 UTC
I just built cacao-0.97 (because 0.96 doesn't work with current cvs somehow),
and ran some new tests. Summary:

1. cacao deadlocks instantly when specifying a directory name on the command
   line, probably a cacao bug: cacao -Xmx200m niffler.Niffler <directoryname>
   hangs. 

   When no directory argument is given, the GUI starts up fine. Afterwards,
   a directory can be selected (minus the crazy scrolling issues in 
   JFileChooser). The first images are loaded fine, but then:

2. cacao seems not to garbage-collect image memory; leaks about 20M+ per 
   image and seems to ignore the -Xmx parameter completely. Running a 
   slideshow crashes when system memory is exhausted.

3. My previous tests were done on my "main" image collection, residing on
   a Windows FAT32 partition (vfat). I copied the images to a native ext3
   partition. Image loading seems slightly faster under jamvm, but a 
   slideshow still crashes (about the same number of images as FAT32).

4. To test for image-format dependencies, I created two new subdirectories
   with 1000 identical copies of a reference image (800x533), once in JPG
   and once in GIF format.

   Running a slideshow in jamvm in either of these directories results in
   a flat memory usage of ~67M RES (as reported by top). No crash. (!)

5. Same with 1000 GIF images of 1200x800: jamvm allocates ~148M RES, but
   seems to stay flat from there.

6. Running a slideshow in a directory with 3000 images of just four 
   different sizes (1200x800, 1300x700, 900x1100, 500x900) but in random
   order allocates up to 300M (at -Xmx200m) and then crashes.

So: is there any reason that the Heap might get fragmented, and the
garbage-collector cannot "repair" this?

Is there any reason that both jamvm and cacao ignore the -Xmx parameter?
Comment 20 Francis Kung 2006-11-07 18:52:24 UTC
Hi Norman,

Interesting... I'm at a 600 pictures, 1600x1200, and jamvm isn't even sweating.  Memory use stays within a 5 mb range, and the total heap hasn't gone over 50mb (for a 500mb maximum).

On cacao, however, with the same pictures and settings, the heap quickly grows to the maximum and it Out of Memory's after 10-15 pictures. 

---
And I've just read your last comment, so I'll reply to that as well.

(In reply to comment #19)
> I just built cacao-0.97 (because 0.96 doesn't work with current cvs somehow),
> and ran some new tests. Summary:
> 
> 2. cacao seems not to garbage-collect image memory; leaks about 20M+ per 
>    image and seems to ignore the -Xmx parameter completely. Running a 
>    slideshow crashes when system memory is exhausted.

cacao doesn't seem to garbage-collect as frequently; it could well be that cacao's garbage collection can't keep up or is buggy (or that we rely on non-standard garbage collection behaviour somewhere in classpath...).  I can reproduce the OutOfMemory very quickly with cacao.

> 4. To test for image-format dependencies, I created two new subdirectories
>    with 1000 identical copies of a reference image (800x533), once in JPG
>    and once in GIF format.
> 
>    Running a slideshow in jamvm in either of these directories results in
>    a flat memory usage of ~67M RES (as reported by top). No crash. (!)

I was starting to suspect something like this; that's probably why I can't reproduce the crash with jamvm: my test images have been all the same size

> 5. Same with 1000 GIF images of 1200x800: jamvm allocates ~148M RES, but
>    seems to stay flat from there.
> 
> 6. Running a slideshow in a directory with 3000 images of just four 
>    different sizes (1200x800, 1300x700, 900x1100, 500x900) but in random
>    order allocates up to 300M (at -Xmx200m) and then crashes.

So it's got something to do with changing image sizes... I'll change my test data to match.

> So: is there any reason that the Heap might get fragmented, and the
> garbage-collector cannot "repair" this?

I have no idea... I really don't know much about garbage collector or JVM internals =(

> Is there any reason that both jamvm and cacao ignore the -Xmx parameter?

The -Xmx setting only applies to the java heap.  Any memory that we allocate in native JNI methods (and there are quite a few of them, especially in image work) won't count towards the Xmx setting...  and thus, a memory leak in native code wouldn't cause an OutOfMemoryError (it would go through all system and swap memory first, before crashing in some horrible way)

Thanks for being so persistent with this!
Francis
Comment 21 Norman Hendrich 2006-11-09 21:19:43 UTC
Hello Francis,

> http://b2.complang.tuwien.ac.at/cgi-bin/bugzilla/show_bug.cgi?id=32
> This does not happen with jamvm (using the most recent jamvm, 
> I can't reproduce any leak at all in Niffler), 

aaahhh!

Ok, I upgraded from jamvm 1.4.2 to 1.4.4. It is a little embarrasing, but
the leak seems indeed fixed with current jamvm+cvs. 

My test slideshow got somewhat slow at image 1320, but it was just loading 
an image (JPG) of size 6660x3770 that I had downloaded from the web without 
noticing the size. Memory usage went up to over ~600M VSS and 500M RES
on that image, but got down again after that. Performance seems ok, as well.

Thanks for your patience! I would have closed this, but bugzilla somehow
only allows "leave as assigned" at the moment...




Comment 22 Norman Hendrich 2006-11-09 21:36:02 UTC
it just got worse: s/embarrasing/embarrassing/ :-)
Comment 23 Francis Kung 2006-11-09 21:43:25 UTC
heh heh... I'm glad it's working! =)

I'm still a bit curious to know what changed in jamvm, to make sure this is a jvm bug and not a classpath bug, but I'll close it for now... we can re-open if the cacao people toss it back to us.
Comment 24 Roman Kennke 2006-11-10 09:03:30 UTC
I think JamVM 1.4.3 had a bug in their new heap compression. This also caused a memory leak with Eclipse IIRC.
Comment 25 Norman Hendrich 2006-11-10 09:07:10 UTC
> heh heh... I'm glad it's working! =)

Me too...

> I'm still a bit curious to know what changed in jamvm, to make sure this is a
> jvm bug and not a classpath bug, but I'll close it for now... we can re-open 
> if the cacao people toss it back to us.

Well, the release notes of jamvm 1.4.3 (which I never tested) mention a new
GC algorithm with optional heap-compaction. That might explain it, only that 
in the case of Niffler the "small holes" in the heap would rather be ~100M
from time to time :-)



Comment 26 Robert Lougher 2006-11-13 04:07:51 UTC
Hi,

I've been keeping track of this bug off and on for the last few weeks.  Unfortunately, I assumed Norman was using JamVM 1.4.3, as I didn't believe JamVM 1.4.2 would have worked as well as it did.  I therefore assumed it was a leak in the native code...

The output from -verbose:gc gives an idea what is happening.  This is running Niffler on JamVM 1.4.4 with 1000+ images (2592x1944 and 1600x1200) and a heap size of 200MB.

As can be seen, a GC is occurring every couple of images.  There are a lot of finalizers being ran (on average 100 each GC) and also several references are being cleared.

It also shows that a compaction run is being done approximately every 26 garbage-collections.  This explains the difference in behaviour betweern JamVM 1.4.2 and JamVM 1.4.3+.  On JamVM 1.4.2 after several hundred images the heap has fragmented so that a large allocation cannot be satisfied, and the alloc returns NULL (leading to the SEGV in native code).

On JamVM 1.4.3+ when this situation is reached the compactor is ran which removes the fragmentation and allows subsequent image allocations to succeed.   

On JamVM 1.4.2 increasing the heap size allows the program to run for longer, but the heap will always end up fragmented (the allocation pattern must leave long-lived objects at intervals throughout the heap, making the large image allocations harder to satisfy).  On JamVM 1.4.3+ the compactor ensures that sufficent contiguous space is always available.

The story for CACAO is more difficult to explain.  As it fails so quickly it sounds unlikely that it is a fragmentation issue.  As there are such a large number of finalized objects being created, I would suspect that CACAO's finalizer thread is being starved, and the rate of finalization is too slow to stop the heap completely filling.

Hope this helps,

Rob.

<GC: Allocated objects: 114881>

<GC: Freed 14 object(s) using 576 bytes>

<GC: Moved 5953 objects, largest block is 60364648 total free is 163884784 out of 209715192 (78%)>

<GC: Mark took 0.015948 seconds, compact took 0.064145 seconds>

-I- getCairoBackedImage: took 85 msecs. for 2592x1944

-I- getCairoBackedImage: took 444 msecs. for 2592x1944

<GC: Alloc attempt for 19660816 bytes failed.>

<GC: Allocated objects: 117142>

<GC: Freed 30916 object(s) using 89090624 bytes, cleared 3 reference(s)>

<GC: Largest block is 20182312 total free is 154226632 out of 209715192 (73%)>

<GC: Mark took 1.435102 seconds, scan took 0.003097 seconds>

<GC: enqueuing 3 references>

<GC: running 168 finalisers>

-I- getCairoBackedImage: took 87 msecs. for 2560x1920

-I- getCairoBackedImage: took 30 msecs. for 1600x1200

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 116188>

<GC: Freed 72642 object(s) using 111679184 bytes, cleared 3 reference(s)>

<GC: Largest block is 20155744 total free is 173463864 out of 209715192 (82%)>

<GC: Mark took 0.024944 seconds, scan took 0.005724 seconds>

<GC: enqueuing 3 references>

<GC: running 112 finalisers>

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 114956>

<GC: Freed 1375 object(s) using 42726784 bytes>

<GC: Largest block is 24431968 total free is 175874136 out of 209715192 (83%)>

<GC: Mark took 0.017537 seconds, scan took 0.002309 seconds>

<GC: running 7 finalisers>

-I- getCairoBackedImage: took 89 msecs. for 2592x1944

-I- getCairoBackedImage: took 95 msecs. for 2560x1920

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 116007>

<GC: Freed 70634 object(s) using 107036712 bytes, cleared 2 reference(s)>

<GC: Largest block is 20156392 total free is 163845152 out of 209715192 (78%)>

<GC: Mark took 0.019563 seconds, scan took 0.008767 seconds>

<GC: enqueuing 2 references>

<GC: running 99 finalisers>

-I- getCairoBackedImage: took 93 msecs. for 2592x1944

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 115762>

<GC: Freed 60424 object(s) using 66097072 bytes, cleared 1 reference(s)>

<GC: Largest block is 27835568 total free is 178419000 out of 209715192 (85%)>

<GC: Mark took 0.019869 seconds, scan took 0.004583 seconds>

<GC: enqueuing 1 references>

<GC: running 52 finalisers>

-I- getCairoBackedImage: took 96 msecs. for 2592x1944

<GC: Alloc attempt for 2097168 bytes failed.>

<GC: Allocated objects: 116077>

<GC: Freed 15688 object(s) using 67153312 bytes, cleared 1 reference(s)>

<GC: Largest block is 20176264 total free is 182457432 out of 209715192 (87%)>

<GC: Mark took 0.029573 seconds, scan took 0.003399 seconds>

<GC: enqueuing 1 references>

<GC: running 79 finalisers>

-I- getCairoBackedImage: took 85 msecs. for 2592x1944

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 115848>

<GC: Freed 14242 object(s) using 73778744 bytes, cleared 1 reference(s)>

<GC: Largest block is 20157152 total free is 181386400 out of 209715192 (86%)>

<GC: Mark took 0.023036 seconds, scan took 0.002834 seconds>

<GC: enqueuing 1 references>

<GC: running 61 finalisers>

-I- getCairoBackedImage: took 84 msecs. for 2592x1944

<GC: Alloc attempt for 1610104 bytes failed.>

<GC: Allocated objects: 115957>

<GC: Freed 15224 object(s) using 65225696 bytes, cleared 1 reference(s)>

<GC: Largest block is 20901368 total free is 179807672 out of 209715192 (85%)>

<GC: Mark took 0.019542 seconds, scan took 0.003459 seconds>

<GC: enqueuing 1 references>

<GC: running 70 finalisers>

-I- getCairoBackedImage: took 89 msecs. for 2592x1944

<GC: Alloc attempt for 7680016 bytes failed.>

<GC: Allocated objects: 116050>

<GC: Freed 62476 object(s) using 70916896 bytes, cleared 1 reference(s)>

<GC: Largest block is 20157152 total free is 181196040 out of 209715192 (86%)>

<GC: Mark took 0.019086 seconds, scan took 0.005795 seconds>

<GC: enqueuing 1 references>

<GC: running 76 finalisers>

-I- getCairoBackedImage: took 32 msecs. for 1600x1200

-I- getCairoBackedImage: took 96 msecs. for 2592x1944

<GC: Alloc attempt for 1464680 bytes failed.>

<GC: Allocated objects: 116604>

<GC: Freed 121323 object(s) using 95469520 bytes, cleared 2 reference(s)>

<GC: Largest block is 20157336 total free is 173712176 out of 209715192 (82%)>

<GC: Mark took 0.022549 seconds, scan took 0.006228 seconds>

<GC: enqueuing 2 references>

<GC: running 124 finalisers>

-I- getCairoBackedImage: took 93 msecs. for 2592x1944

<GC: Alloc attempt for 2097168 bytes failed.>

<GC: Allocated objects: 115985>

<GC: Freed 15519 object(s) using 74755824 bytes, cleared 1 reference(s)>

<GC: Largest block is 20157152 total free is 182461608 out of 209715192 (87%)>

<GC: Mark took 0.016847 seconds, scan took 0.005171 seconds>

<GC: enqueuing 1 references>

<GC: running 71 finalisers>

-I- getCairoBackedImage: took 88 msecs. for 2560x1920

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 115965>

<GC: Freed 17794 object(s) using 100662776 bytes, cleared 1 reference(s)>

<GC: Largest block is 20155552 total free is 163844064 out of 209715192 (78%)>

<GC: Mark took 0.034024 seconds, scan took 0.003198 seconds>

<GC: enqueuing 1 references>

<GC: running 89 finalisers>

-I- getCairoBackedImage: took 90 msecs. for 2592x1944

-I- getCairoBackedImage: took 35 msecs. for 1600x1200

<GC: Alloc attempt for 24 bytes failed.>

<GC: Allocated objects: 115973>

<GC: Freed 67937 object(s) using 61280568 bytes, cleared 2 reference(s)>

<GC: Largest block is 20901368 total free is 175835480 out of 209715192 (83%)>

<GC: Mark took 0.128026 seconds, scan took 0.004682 seconds>

<GC: enqueuing 2 references>

<GC: running 88 finalisers>

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 115145>

<GC: Freed 4004 object(s) using 54313848 bytes, cleared 2 reference(s)>

<GC: Largest block is 20176264 total free is 196021216 out of 209715192 (93%)>

<GC: Mark took 0.019317 seconds, scan took 0.003205 seconds>

<GC: enqueuing 2 references>

<GC: running 28 finalisers>

-I- getCairoBackedImage: took 85 msecs. for 2592x1944

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 115742>

<GC: Freed 11691 object(s) using 55446928 bytes, cleared 2 reference(s)>

<GC: Largest block is 20156552 total free is 183524800 out of 209715192 (87%)>

<GC: Mark took 0.014645 seconds, scan took 0.005240 seconds>

<GC: enqueuing 2 references>

<GC: running 51 finalisers>

-I- getCairoBackedImage: took 86 msecs. for 2592x1944

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 116080>

<GC: Freed 62232 object(s) using 45432960 bytes, cleared 1 reference(s)>

<GC: Largest block is 20901368 total free is 181008976 out of 209715192 (86%)>

<GC: Mark took 0.013183 seconds, scan took 0.005511 seconds>

<GC: enqueuing 1 references>

<GC: running 73 finalisers>

-I- getCairoBackedImage: took 97 msecs. for 2592x1944

<GC: Alloc attempt for 7680016 bytes failed.>

<GC: Allocated objects: 115768>

<GC: Freed 59612 object(s) using 67479616 bytes, cleared 1 reference(s)>

<GC: Largest block is 20934104 total free is 182228456 out of 209715192 (86%)>

<GC: Mark took 0.018875 seconds, scan took 0.004109 seconds>

<GC: enqueuing 1 references>

<GC: running 52 finalisers>

-I- getCairoBackedImage: took 35 msecs. for 1600x1200

-I- getCairoBackedImage: took 101 msecs. for 2592x1944

<GC: Alloc attempt for 7680016 bytes failed.>

<GC: Allocated objects: 116709>

<GC: Freed 73384 object(s) using 89270504 bytes, cleared 3 reference(s)>

<GC: Largest block is 20156392 total free is 174458376 out of 209715192 (83%)>

<GC: Mark took 0.018745 seconds, scan took 0.005469 seconds>

<GC: enqueuing 3 references>

<GC: running 134 finalisers>

-I- getCairoBackedImage: took 35 msecs. for 1600x1200

-I- getCairoBackedImage: took 96 msecs. for 2592x1944

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 116442>

<GC: Freed 72229 object(s) using 97614280 bytes, cleared 3 reference(s)>

<GC: Largest block is 20156552 total free is 172747128 out of 209715192 (82%)>

<GC: Mark took 0.041852 seconds, scan took 0.005110 seconds>

<GC: enqueuing 3 references>

<GC: running 109 finalisers>

-I- getCairoBackedImage: took 91 msecs. for 2592x1944

<GC: Alloc attempt for 7680016 bytes failed.>

<GC: Allocated objects: 115925>

<GC: Freed 61229 object(s) using 76310448 bytes, cleared 1 reference(s)>

<GC: Largest block is 20901368 total free is 181303016 out of 209715192 (86%)>

<GC: Mark took 0.020063 seconds, scan took 0.004563 seconds>

<GC: enqueuing 1 references>

<GC: running 63 finalisers>

-I- getCairoBackedImage: took 34 msecs. for 1600x1200

-I- getCairoBackedImage: took 33 msecs. for 1600x1200

-I- getCairoBackedImage: took 108 msecs. for 2592x1944

<GC: Alloc attempt for 1273952 bytes failed.>

<GC: Allocated objects: 117222>

<GC: Freed 131663 object(s) using 113385376 bytes, cleared 3 reference(s)>

<GC: Largest block is 20157152 total free is 166011016 out of 209715192 (79%)>

<GC: Mark took 0.021839 seconds, scan took 0.006358 seconds>

<GC: enqueuing 3 references>

<GC: running 172 finalisers>

-I- getCairoBackedImage: took 97 msecs. for 2592x1944

<GC: Alloc attempt for 7680016 bytes failed.>

<GC: Allocated objects: 116064>

<GC: Freed 15921 object(s) using 81547992 bytes, cleared 1 reference(s)>

<GC: Largest block is 20157360 total free is 182035808 out of 209715192 (86%)>

<GC: Mark took 0.018777 seconds, scan took 0.003303 seconds>

<GC: enqueuing 1 references>

<GC: running 72 finalisers>

-I- getCairoBackedImage: took 33 msecs. for 1600x1200

-I- getCairoBackedImage: took 111 msecs. for 2592x1944

<GC: Alloc attempt for 4194320 bytes failed.>

<GC: Allocated objects: 116680>

<GC: Freed 121664 object(s) using 95069096 bytes, cleared 2 reference(s)>

<GC: Largest block is 20157152 total free is 173705104 out of 209715192 (82%)>

<GC: Mark took 0.035105 seconds, scan took 0.007327 seconds>

<GC: running 124 finalisers>

<GC: enqueuing 2 references>

-I- getCairoBackedImage: took 81 msecs. for 2560x1920

<GC: Alloc attempt for 1795224 bytes failed.>

<GC: Allocated objects: 116001>

<GC: Freed 17746 object(s) using 81112504 bytes, cleared 1 reference(s)>

<GC: Largest block is 20155776 total free is 181898928 out of 209715192 (86%)>

<GC: Mark took 0.018072 seconds, scan took 0.003519 seconds>

<GC: enqueuing 1 references>

<GC: running 86 finalisers>

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 114997>

<GC: Freed 1069 object(s) using 25887072 bytes>

<GC: Largest block is 20155408 total free is 184036992 out of 209715192 (87%)>

<GC: Mark took 0.019320 seconds, scan took 0.003003 seconds>

<GC: running 4 finalisers>

<GC: Alloc attempt for 20155408 bytes failed.>

<GC: Allocated objects: 115010>

<GC: Freed 86 object(s) using 2760 bytes>

<GC: Largest block is 19662560 total free is 163880960 out of 209715192 (78%)>

<GC: Mark took 0.010356 seconds, scan took 0.005524 seconds>

<GC: running 1 finalisers>

<GC: Waiting for finalizers to be ran.>

<GC: Allocated objects: 115003>

<GC: Freed 7 object(s) using 288 bytes>

<GC: Moved 5929 objects, largest block is 80332688 total free is 163881232 out of 209715192 (78%)>

<GC: Mark took 0.018155 seconds, compact took 0.126048 seconds>

-I- getCairoBackedImage: took 87 msecs. for 2592x1944

-I- getCairoBackedImage: took 92 msecs. for 2560x1920

Comment 27 Norman Hendrich 2006-11-13 08:19:08 UTC
(In reply to comment #26)
> I've been keeping track of this bug off and on for the last few weeks. 
> Unfortunately, I assumed Norman was using JamVM 1.4.3, as I didn't believe
> JamVM 1.4.2 would have worked as well as it did.  I therefore assumed it was a
> leak in the native code...

I guess my only excuse is that 1.4.2 was/is working so well for all my 
other tests that I didn't really suspect it :-)