Initial Heap Size and other tweaks (gcj 3.2.3 on debian)

Boehm, Hans
Mon Apr 28 22:20:00 GMT 2003

I think Andrew already pointed out the root of the problem here.  The IBM JVM is probably avoiding a lot of the allocation altogether.  Based on some very old comparisons, my recollection is that the IBM collector is in the same ballpark as the gcj one.

The fact that the IBM JVM is successful at this suggests that a lot of the allocation is done to hold complex numbers that are effectively expression temporaries, and are never being stored in a matrix with substantial lifetime.  Does that sound right?  (If it was really the IBM collector that was a lot faster, that would suggest to me that (a) it's generational, which I didn't think it was, and (b) generational collection is a big win here.  Observation (b) would also lead me to the same conclusion, i.e. much of the allocation is for temporaries.)

If this is correct, you can probably get a bit of performance by defining additional operations on complex numbers, e.g. 3 and 4 operand addition in addition to the binary case.  I admit that's a bit ugly, depending on how esoteric you have to make those operations.

Defining GC_PRINT_STATS should give you some idea what the collector is doing.  Building the collector without -DSILENT will give you more information.  AFAIK GC_INITIAL_HEAP_SIZE works fine.  It wouldn't surprise me if uncommitted memory is probably not reported.  (That's arguably the right behavior.  E.g. on some platforms linuxthreads maps 2MB or so per stack immediately when a thread is started, realizing that most of that will never be committed.)  Looking at /proc/<nnn>/maps may be more informative.

A profile of the gcj process would help.  (If you don't have any way to do that, there's a very preliminary version of the tool I've been using at .  Feedback is appreciated.)


> -----Original Message-----
> From: karlm@MIT.EDU [mailto:karlm@MIT.EDU]
> Sent: Sunday, April 27, 2003 5:06 PM
> To:
> Subject: Initial Heap Size and other tweaks (gcj 3.2.3 on debian)
> I inherited an admittedly poor Java port of a Fortran simulation.
> It allocates and de-allocates immutable complex number objects
> with each operation performed on the values of a matrix of complex
> numbers.
> This is known to thrash the libgcj garbage collector.  I tried the
> recomended method of reducing the frequency of garbage colelctions
> by setting the environment variable GC_INITIAL_HEAP_SIZE to
> 256000000.  This appears to not affect the initial heap size, unless
> Linux only counts memory as used once it gets initially paged in
> following a page fault.  In any case, even with "-O3 -fno-bounds-check
> -fno-store-check -fomit-frame-pointer -static" the gcj-generated
> native binary takes over 380 seconds to perform the main loop while
> the IBM 1.4.0 JVM takes 25 seconds.  Both times are self-reported
> by calling System.getTimeMillis() directly before and after 
> the main loop
> and printing the difference. My gcj version is gcj (GCC) 
> 3.2.3 20030415 
> (Debian prerelease).
> Does anyone have any further suggestions for getting the GCJ
> performance on par with the JVM?  From program load to program
> exit, the Fortran77 version of the code does the same calculation
> in 0.6 seconds, according to the "time" command-line utility.
> The long-term solution is of course to use a mutable complex number
> class (cleaner) or split the matrix into a matrix for the real part
> and a matrix for the imaginary part (probably faster, but less clean).
> Thanks,
> -Karl
> --------
> Karl Alexander Magdsick
> "For indoor and outdoor use only."
> -- Japanese lights --

More information about the Java mailing list