This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Thoughts about static linking and reducing size of binaries


Jeff Sturm wrote:

> It's funny Bryce, I was just thinking about libgcj bloat myself, but
> hadn't even considered static linking.  I think even libgcj.so is
> becoming excessively large.

Personally I'm not particularly worried about the size of the shared library.
On x86 it is under 5MB stripped, only about 2.5x the size of the uncompressed
libgcj.jar. That doesn't seem excessive when you consider that its actually a
complete self contained java runtime! Incidentally, stripping does not seem to
remove the symbols that allow stack traces - you just loose the line numbers
and local variable debugging etc). But I do think 1.5MB stripped is too much
for a static hello world app that doesn't need to do any dynamic loading.

I'm sure there are ways the compiler can be generally improved to emit more
compact object files - my guess is the bloat mostly does not come from the code
itself but rather the large amount of symbols for strings, the EH tables, etc.

> Normally that'd just be a waste of disk and virtual memory (no big deal on
> modern systems), but remember each class registers itself on startup, and
> the way the global constructors are dispersed many pages throughout the
> lib are touched.  The result is an inordinate number of page faults and
> real memory allocated.

I hadn't really thought about this. Are the global constructors really
dispersed throughout libgcj.so or does the linker do the right thing and group
them together? I guess this is what causes a higher first-time startup cost:
sometimes I notice a 1 - 2 s delay the first time I run a java binary after
installing a new libgcj, but once those pages are loaded the kernel will keep
them cached for a while.  Anyway, isn't this is easily fixed with
whole-package/whole-program compilation? We'll just have a single (or just a
few) class registration functions that register lots of classes one shot. The
RSS of a GCJ app is already far smaller than a Hotspot app, its always nice to
know that there is plenty of room for additional improvements!

> I was considering something more like a hybrid of the current bytecode
> interpreter and native compiler: compile all the methods to object code,
> but defer linkage until runtime.  The interpreter already knows how to
> construct the vtables, metadata, etc. and can do so reasonably fast.  This
> approach has the drawback of needing libgcj.jar at runtime however.

> A less radical suggestion is to defer class linkage, i.e. given
> classes A,B:
>
> class A {
>   A () {
>     new B ();
>   }
> }
>
> class B {
>  ...
> }
>
> compile A to the equivalent of:
>
> Class bClass;
>
> <clinit> () {
>   bClass = Class.forName ("B");
> }
>
> <init> () {
>   bClass.newInstance ();
> }
>
> instead of binding directly to the B.class$ symbol.  The effect would be
> to prevent linking B into the static binary, and avoid loading it
> altogether if A is never initialized.

Cool idea!  I think to get the right semantics for Java it would really have to
only try to load B when the B constructor is actually run, not when A itself is
initialized. I suppose it could emit something like:

Class bClass;

<init> () {
  if (__builtin_expect(bClass == null, false))
    {
      bClass = Class.forName("B");  // throws ClassNotFoundException
    }
}

This adds some slight code bloat perhaps, but I don't think there'd be much of
a performance hit given that we already do this everywhere for class
initialization checks, which we can then get rid of because we know forName()
will always return an initialized class! All references to non-virtual method
symbols, static field symbols, etc that are loaded in this way would have to be
weak, but that's okay because any path that accesses them would have to first
go through the initialization check above. I do *not* think that this technique
could be used to load classes via the interpreter (there's no
practical/portable way the interpreter can insert symbols into the symbol
table, right? or is there??), but it would neatly defer linkage to runtime and
result in correct java language semantics. Wow, this just might work!

regards

  [ bryce ]



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]