[Fwd: Success with Nutch & GCJ]

Andrzej Bialecki ab@getopt.org
Thu Feb 9 23:16:00 GMT 2006

(Please don't cc: me - I'm subscribed. Thank you.)

Tom Tromey wrote:
> Andrzej> GC Warning: Repeated allocation of very large block (appr. size 6578176):
> Andrzej>       May lead to memory leak and poor performance
> Yeah, we know about this.  I don't know of a general solution though.
> What version of gcj are you using?  If you are using 4.0.x, and if
> this message is coming via the http protocol handler, then the problem
> is fixed in 4.1.

I mentioned this in the original report: "gcc version 4.1.0 20060106 
(Red Hat 4.1.0-0.14)"

I'm not sure where the message is coming from, how to check this?

> Andrzej> Nonetheless, I must say I'm impressed - even if there were some memory
> Andrzej> mgmt problems, at the end of the day the whole process was stable, and
> Andrzej> the overall fetching speed in each case was very similar (63 kb/s with
> Andrzej> gij, 75 kb/s with Sun; I used the default settings with 10 threads).
> If you're just using gij and not precompiling, then this is amazing
> indeed... gij is a reasonably ordinary interpreter.

Well, Nutch does a lot of waiting, especially if you put it in a 
"polite" mode and you don't have too many hosts (so that threads are 
blocked, prevented from making multiple simultaneous requests to the 
same host). But content parsing is sometimes demanding, e.g. for PDF, 
Word, PPT files..

> One thing worth trying is BC compiling your app.  If you then
> register the results with the class cache database, you can still run
> with gij but it will pick up your compiled code instead.
> Instructions here:
> http://gcc.gnu.org/wiki/How%20to%20BC%20compile%20with%20GCJ
> This is how we compile all the stuff we put into Fedora Core...

I'm doing it now, I'll report how it works when it's done (I have a lot 
of JARs, it's taking some time...)

Do you think that this curious RAM consumption behaviour I observed is 
related to running in interpreted mode? It seemed to me that gcj very 
aggressively allocated all heap space up to the amount specified with 
-Xmx; that would be ok, but the top(1) showed the RES size to take 
nearly all my physical RAM, and the app was swapping madly. The same 
application, using the same -Xmx on the same machine but under Sun JVM, 
also reported the same virtual size, but the RES size was ~30MB 
throughout the whole test (using exactly the same data).

> We're also quite interested in profiling information, to find out if
> there are bottlenecks in our class libraries.  Oprofile has been
> pretty useful here.

That's for later - performance seems decent enough, and for now I'm too 
surprised that it works at all. :-)

> Andrzej> if only the GUI and JNI apps were similarly advanced ;-)
> I think JNI should work fine -- if you've got problems, file them; for
> some reason we tend to fix JNI bugs pretty quickly :-)

It's a longer story, related to static compilation - it's in the 
archives from the last week.

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

More information about the Java mailing list