[Fwd: Success with Nutch & GCJ]

Tom Tromey tromey@redhat.com
Thu Feb 9 19:01:00 GMT 2006

>>>>> "Andrzej" == Andrzej Bialecki <ab@getopt.org> writes:

Andrzej> You may be interested in this report, especially the memory related
Andrzej> stuff. Nutch is an Open Source search engine project
Andrzej> (http://lucene.apache.org/nutch).


Andrzej> * JAVA_HOME is not set by default. I set it to /usr, where bin/java ->
Andrzej> bin/gcj resides, and it worked. You should set it to wherever you have
Andrzej> the bin/java binary.

FWIW we use the java-gcj-compat package to make gcj look more like an
ordinary VM in situations like this.

Andrzej> * Hadoop Configuration.java:428 makes an explicit cast to
Andrzej>   org.apache.xerces.dom.DocumentImpl, but gcj uses by default its own
Andrzej>   implementation, so it would throw a ClassCastException. This I fixed
Andrzej>   by adding two JARs from the Xalan distribution (xalan.jar and
Andrzej>   serializer.jar), which apparently take precedence over the built-in
Andrzej>   XSL processor (theoretically, you should then specify
Andrzej>   -Djavax.xml.transform.TransformerFactory=org.apache.xalan.processor.TransformerFactoryImpl
Andrzej>   but I didn't need this, not sure why).

If you dropped these into the appropriate directory (endorsed I
think), then the .jars probably have manifest entries causing them to
be picked up by default.  We do this same thing for jonas.

Andrzej> GC Warning: Repeated allocation of very large block (appr. size 6578176):
Andrzej>       May lead to memory leak and poor performance

Yeah, we know about this.  I don't know of a general solution though.

What version of gcj are you using?  If you are using 4.0.x, and if
this message is coming via the http protocol handler, then the problem
is fixed in 4.1.

Andrzej> Nonetheless, I must say I'm impressed - even if there were some memory
Andrzej> mgmt problems, at the end of the day the whole process was stable, and
Andrzej> the overall fetching speed in each case was very similar (63 kb/s with
Andrzej> gij, 75 kb/s with Sun; I used the default settings with 10 threads).

If you're just using gij and not precompiling, then this is amazing
indeed... gij is a reasonably ordinary interpreter.

One thing worth trying is BC compiling your app.  If you then
register the results with the class cache database, you can still run
with gij but it will pick up your compiled code instead.
Instructions here:


This is how we compile all the stuff we put into Fedora Core...

We're also quite interested in profiling information, to find out if
there are bottlenecks in our class libraries.  Oprofile has been
pretty useful here.

Andrzej> My hat's off to GCJ folks - it's amazing how far it's progressed ...


Andrzej> if only the GUI and JNI apps were similarly advanced ;-)

I think JNI should work fine -- if you've got problems, file them; for
some reason we tend to fix JNI bugs pretty quickly :-)

As for AWT and Swing, we're making huge progress in every release.
It's fair to say that this is the most heavily developed part of gcj
and classpath at the moment.  4.1 will include a huge number of Swing


More information about the Java mailing list