progress on method-gc / mangling questions
Mon Jan 21 14:38:00 GMT 2002
Okay, questions first, then an explanation of why I'm asking them...
- Is there any documentation on the java symbol mangling procedure?
Are there any guarantees about its stability in future releases?
(I would guess that there are no guarantees, but I could be wrong).
- If there is no documentation, can anybody tell me the meaning of
these symbol prefixes? I was able to guess the others.
- My binaries have a lot of sections named __Utf<n> where <n> is
some number. What are these sections? String constants?
- How about __methods<n>? My guess would be that this is the data
used for Method.invoke() and Class.getMethods()...
. . . . . . . . . . .
Why am I asking about this? I've written a new tool called "method-gc"
to create small statically linked gcj binaries.
Here's how it works: you compile your code twice -- once to bytecodes,
and once to a relocatable binary. My program then uses the Apache BCEL
to do reachability analysis on the bytecodes. Any unreachable methods
are stripped from a duplicate copy of libgcj.a (compiled with
-ffunction-sections), which is then linked against your code to
produce a small, statically linked binary. I had to special-case a few
internal libgcj methods which are only reachable from libgcj nat***.cc
CNI calls; I guess I'll have to keep this list up to date with each
I've got "hello world" fully statically linked, into 330kb (gzipped)
using this technique. Previously it was around 1.5MB (gzipped).
Obviously the ideal long-term solution is to get libjava.so /
libjava.dll to be a standard part of OS distros (just like libc), and
to use Bryce's indirect dispatch work to allow binary compatability
across library versions. Method-gc is just a short term hack until
that happens (which sadly may be "never" on win32).
More information about the Java