This is the mail archive of the java@gcc.gnu.org mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Thoughts about static linking and reducing size of binaries



It's funny Bryce, I was just thinking about libgcj bloat myself, but
hadn't even considered static linking.  I think even libgcj.so is
becoming excessively large.

On Thu, 15 Mar 2001, Bryce McKinlay wrote:
> The linker of course brings
> in _everything_ that is ever referenced from any class used by the
> application, so if something is imported by java.lang.Class then
> everything used by that class must be brought in as well.

That alone is not very Java-like, and creates other problems as well.  For
instance, I have some .jar archives that expect to work even if
certain classes (i.e. javax.transaction.*) are missing by catching
ClassNotFoundException, etc.  Many JDBC drivers (e.g. pgsql, oracle) do
just that to detect JDBC2 availability at runtime.  These .jar's cannot
easily be compiled with gcj.

Also, the current shared library assumes ELF semantics, which aren't
widely portable.  It'd be nice if class objects (or whole packages, maybe)
could be loaded on demand.

> It is not an issue in shared-library land (or even for static binaries
> on PCs, really),

You think?  I have moderate sizes programs that, when compiled, have over
10MB of text alone.  And libgcj is going to get much larger before we are
through.

Normally that'd just be a waste of disk and virtual memory (no big deal on
modern systems), but remember each class registers itself on startup, and
the way the global constructors are dispersed many pages throughout the
lib are touched.  The result is an inordinate number of page faults and
real memory allocated.

I tested this with a very trivial program.  First the shared version
(tested on alphapca56-unknown-linux-gnu):

[jsturm@mars jsturm]$ size Sleep
   text    data     bss     dec     hex filename
  19576    2632     408   22616    5858 Sleep
[jsturm@mars jsturm]$ /usr/bin/time ./Sleep
0.15user 0.02system 0:01.17elapsed 14%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (743major+213minor)pagefaults 0swaps                          
[jsturm@mars jsturm]$ ./Sleep & ps l
[1] 21129
  F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY     TIME COMMAND
000   537 21129 21036  17   0 13664 6768 proces S    pts/1   0:00 ./Sleep

Then a static compile:

[jsturm@mars jsturm]$ size Sleep.static
   text    data     bss     dec     hex filename
1789097  361800  113224 2264121  228c39 Sleep.static
[jsturm@mars jsturm]$ /usr/bin/time ./Sleep.static
0.06user 0.00system 0:01.06elapsed 6%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (315major+159minor)pagefaults 0swaps                       
[jsturm@mars jsturm]$ ./Sleep.static & ps l
[2] 21133
  F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY     TIME COMMAND
000   537 21133 21036   9   0  8824 3120 proces S    pts/1   0:00 ./Sleep.st

In summary:

        major
       faults     RSS  startup 
shared    743   6768K    170ms
static    315   3120K     60ms

The static build clearly gains a lot simply by not registering unneeded
classes.  The executable is quite large however.

Maybe a partial solution to the shared build could be to eliminate
_Jv_RegisterClass and rely on dlsym() to resolve dynamic class references?
Hmm... perhaps I'll give it a go.  (I realize importing the mangler could
be an issue, but it shouldn't be difficult to try it anyway.)

> A much better and far more general solution would be to have the
> ability to link in only the classes which are actually used
> (loaded/initialized) during execution.

Yes!  That ought to solve several important issues:

- smaller static builds
- smaller (more granular) shared libs
- deferring unresolved class references until runtime

> It would be simple to give
> libgcj the ability to track this and dump out a list of used classes
> during execution that could then be used to make an "size optimal"
> build by feeding that list back to the linker. Is it possible to have
> the linker bring in only a given set of classes at link time, and
> treat other references as weak symbols?

That's a clever idea.  Tools like Sun's javac will already show you
all the static dependencies; it shouldn't be hard to do the same with
gcj.  With a modified compiler you could probably even avoid weak symbols
(which aren't portable anyway).

I was considering something more like a hybrid of the current bytecode
interpreter and native compiler: compile all the methods to object code,
but defer linkage until runtime.  The interpreter already knows how to
construct the vtables, metadata, etc. and can do so reasonably fast.  This
approach has the drawback of needing libgcj.jar at runtime however.

A less radical suggestion is to defer class linkage, i.e. given
classes A,B:

class A {
  A () {
    new B ();
  }
}

class B {
 ...
}

compile A to the equivalent of:

Class bClass;

<clinit> () {
  bClass = Class.forName ("B");
}

<init> () {
  bClass.newInstance ();
}

instead of binding directly to the B.class$ symbol.  The effect would be
to prevent linking B into the static binary, and avoid loading it
altogether if A is never initialized.

Instead of the current monolithic libgcj.{a,so} a compromise might be
package-at-a-time linkage:

java-lang.so
java-lang-ref.so
java-lang-reflect.so
...
java-lang.o
java-lang-ref.o
java-lang-reflect.o
...

I believe the classloader already knows how to use the former, though
certain classes would have to be present at the start, obviously.  (I've
witnessed infinite recursion when the classloader cannot initialize...
ugly.)

Jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]