This is the mail archive of the
java@gcc.gnu.org
mailing list for the Java project.
Re: Thoughts about static linking and reducing size of binaries
- To: Bryce McKinlay <bryce at albatross dot co dot nz>
- Subject: Re: Thoughts about static linking and reducing size of binaries
- From: Jeff Sturm <jsturm at one-point dot com>
- Date: Thu, 15 Mar 2001 01:07:00 -0500 (EST)
- cc: java at gcc dot gnu dot org
It's funny Bryce, I was just thinking about libgcj bloat myself, but
hadn't even considered static linking. I think even libgcj.so is
becoming excessively large.
On Thu, 15 Mar 2001, Bryce McKinlay wrote:
> The linker of course brings
> in _everything_ that is ever referenced from any class used by the
> application, so if something is imported by java.lang.Class then
> everything used by that class must be brought in as well.
That alone is not very Java-like, and creates other problems as well. For
instance, I have some .jar archives that expect to work even if
certain classes (i.e. javax.transaction.*) are missing by catching
ClassNotFoundException, etc. Many JDBC drivers (e.g. pgsql, oracle) do
just that to detect JDBC2 availability at runtime. These .jar's cannot
easily be compiled with gcj.
Also, the current shared library assumes ELF semantics, which aren't
widely portable. It'd be nice if class objects (or whole packages, maybe)
could be loaded on demand.
> It is not an issue in shared-library land (or even for static binaries
> on PCs, really),
You think? I have moderate sizes programs that, when compiled, have over
10MB of text alone. And libgcj is going to get much larger before we are
through.
Normally that'd just be a waste of disk and virtual memory (no big deal on
modern systems), but remember each class registers itself on startup, and
the way the global constructors are dispersed many pages throughout the
lib are touched. The result is an inordinate number of page faults and
real memory allocated.
I tested this with a very trivial program. First the shared version
(tested on alphapca56-unknown-linux-gnu):
[jsturm@mars jsturm]$ size Sleep
text data bss dec hex filename
19576 2632 408 22616 5858 Sleep
[jsturm@mars jsturm]$ /usr/bin/time ./Sleep
0.15user 0.02system 0:01.17elapsed 14%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (743major+213minor)pagefaults 0swaps
[jsturm@mars jsturm]$ ./Sleep & ps l
[1] 21129
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
000 537 21129 21036 17 0 13664 6768 proces S pts/1 0:00 ./Sleep
Then a static compile:
[jsturm@mars jsturm]$ size Sleep.static
text data bss dec hex filename
1789097 361800 113224 2264121 228c39 Sleep.static
[jsturm@mars jsturm]$ /usr/bin/time ./Sleep.static
0.06user 0.00system 0:01.06elapsed 6%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (315major+159minor)pagefaults 0swaps
[jsturm@mars jsturm]$ ./Sleep.static & ps l
[2] 21133
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
000 537 21133 21036 9 0 8824 3120 proces S pts/1 0:00 ./Sleep.st
In summary:
major
faults RSS startup
shared 743 6768K 170ms
static 315 3120K 60ms
The static build clearly gains a lot simply by not registering unneeded
classes. The executable is quite large however.
Maybe a partial solution to the shared build could be to eliminate
_Jv_RegisterClass and rely on dlsym() to resolve dynamic class references?
Hmm... perhaps I'll give it a go. (I realize importing the mangler could
be an issue, but it shouldn't be difficult to try it anyway.)
> A much better and far more general solution would be to have the
> ability to link in only the classes which are actually used
> (loaded/initialized) during execution.
Yes! That ought to solve several important issues:
- smaller static builds
- smaller (more granular) shared libs
- deferring unresolved class references until runtime
> It would be simple to give
> libgcj the ability to track this and dump out a list of used classes
> during execution that could then be used to make an "size optimal"
> build by feeding that list back to the linker. Is it possible to have
> the linker bring in only a given set of classes at link time, and
> treat other references as weak symbols?
That's a clever idea. Tools like Sun's javac will already show you
all the static dependencies; it shouldn't be hard to do the same with
gcj. With a modified compiler you could probably even avoid weak symbols
(which aren't portable anyway).
I was considering something more like a hybrid of the current bytecode
interpreter and native compiler: compile all the methods to object code,
but defer linkage until runtime. The interpreter already knows how to
construct the vtables, metadata, etc. and can do so reasonably fast. This
approach has the drawback of needing libgcj.jar at runtime however.
A less radical suggestion is to defer class linkage, i.e. given
classes A,B:
class A {
A () {
new B ();
}
}
class B {
...
}
compile A to the equivalent of:
Class bClass;
<clinit> () {
bClass = Class.forName ("B");
}
<init> () {
bClass.newInstance ();
}
instead of binding directly to the B.class$ symbol. The effect would be
to prevent linking B into the static binary, and avoid loading it
altogether if A is never initialized.
Instead of the current monolithic libgcj.{a,so} a compromise might be
package-at-a-time linkage:
java-lang.so
java-lang-ref.so
java-lang-reflect.so
...
java-lang.o
java-lang-ref.o
java-lang-reflect.o
...
I believe the classloader already knows how to use the former, though
certain classes would have to be present at the start, obviously. (I've
witnessed infinite recursion when the classloader cannot initialize...
ugly.)
Jeff