This is the mail archive of the
java-discuss@sourceware.cygnus.com
mailing list for the Java project.
Byte-Code Verifier
- To: java-discuss@sourceware.cygnus.com
- Subject: Byte-Code Verifier
- From: Kresten Krab Thorup <krab@gnu.org>
- Date: 22 Jul 1999 21:28:58 +0900
So,
I've spend the last couple of days working on a byte code verifier.
At this point, I'd like to get some early feedback. I've only checked
the verifier with some test cases I built myself, and then the entire
JDK. I.e., I have no "negative" test cases, ... really. So, if you
have some extra time, please pick it up and do a few test-runs.
It's written in Java, but the performance seems to be ok. Running
Sun's Hotspot 1.0.1, checking the entire 4272 classes in the JDK on my
sparc, yields 60 classes/second. If compiled with gcj, it's 1/5th of
that, but at least it works!
I've packaged it as a seperate entity (i.e., it does currently not use
the gnu.. package names), but I intend to let it be included in GCJ,
under your licence, if you like.
I've put it up at
http://www.yl.is.s.u-tokyo.ac.jp/~krab/verify-0.1.tar.gz
There's a readme file, which I've also attached below.
Right now I put it under LGPL, but I suppose I'm in a possition to
change that for new versions if that is the right thing to do.
Just checking quickly, it seems that I found a bug in some gcj-
compiled code; it reports the following (when run in --verbose) mode.
checking gnu.gcj.text.BaseBreakIterator method first()
code at [784..804]
signature is () int
branch ==> 0 (0)
--[0]----------
() <#> mode=1 pc=0 aload_0
(#)<#> mode=1 pc=1 getfield
get field java.text.CharacterIterator
(#)<#> mode=1 pc=4 invokeinterface
calling java/text/CharacterIterator.first () char
(C)<#> mode=1 pc=9 nop
(C)<#> mode=1 pc=10 aload_0
*** VERIFICATION FAILURE ***
exception at pc=10 (aload_0)
stack overflow: load(0)
*** state of first ***
stack[0] = char
local[0] = gnu.gcj.text.BaseBreakIterator
******************************
javap reports:
public abstract class gnu.gcj.text.BaseBreakIterator
extends java.text.BreakIterator {
public int first();
/* Stack=1, Locals=1, Args_size=1 */
...
Method int first()
0 aload_0
1 getfield #15 <Field java.text.CharacterIterator iter>
4 invokeinterface (args 87) #28 <InterfaceMethod char first()>
9 nop
10 aload_0
11 getfield #15 <Field java.text.CharacterIterator iter>
14 invokeinterface (args 1) #31 <InterfaceMethod int getBeginIndex()>
19 ireturn
}
So hey! Maybe I just found my first real test case. Keep'em coming.
--Kresten
--[README FILE]----------------------------------------------------------
Kresten's Verifier for Java Byte Codes
======================================
The interface to the verifier is currently very crude.
To run it, type
prompt> java verify.Tool class-or-file-name ...
If you give it a name of a class file, it will try to figure out
the corresponding class name, (removing leading "./", and
terminating ".class", and replace '.' with the path seperator).
To start off, you may want to verify the verifier. On unix, you can
write something like this,
prompt> java verify.Tool `find verify/ -name '*.class'`
As it runs, it will print a '.' for each method checked, and a [name]
whenever it loads extra auxillary classes needed.
It has two interesting options.
--classpath=PATH
Where path is a : or ; separated path (according to the java
property "path.separator"). This specifies the "class path" from
which to load classes. The classpath defaults to whatever the Java
Runtime decides to make available in the java.class.path property.
The verifier does not use the Java system's class loader. Rather,
it incorporates it's own loading mechanism which is more
ligh-weight. Also you don't really need to "load" classes in order
to verify them.
The second interesting option is
--verbose=true
which will make the verifier print a line of information for each
instruction checked. (or perhaps this is just to be considered my
debugging output...) It might look like this:
--[151]----------
(##) <[#JXJX#I#I????????> pc=151 new
(##@) <[#JXJX#I#I????????> pc=154 dup
(##@@) <[#JXJX#I#I????????> pc=155 iconst_1
(##@@I) <[#JXJX#I#I????????> pc=156 invokespecial
calling java/lang/Boolean.<init> (boolean) void
For a given sequence of four instructions. The --[151]---- marks
an entrypoint for a basic block. The part on the left, in
parentheses describes the state of the stack at this particular
instruction. Each character represents one "stack item". The
right-most element is the top of the stack. Generally, these are
the same as the characters encoding types in a field descriptor,
"I" is integer, "J" is long, "D" is double and so on. Special are
however "X", representing the second half of a long, and "Q", the
same thing for doubles. "#" is a normal object, "@" is an
uninitialized object (one for which a constructor has not yet been
called) and "[" is an array. (See types/TypeTags.java for the full
list)
On the right, in angle brackets, is the "state" of the local
variables. These are represented using the same characters as
before. Location, which have yet to be assigned a value, are
marked with "?". One of the jobs of the verifier is to make sure
that such a location is never being read.
Finally, on the right, you see the current pc (program counter) and
the symbolic name of the opcode being checked.
When the bytecode verifier detects an error, and you did not choose
the --verbose=true option already, it will turn it on for you, and
re-check the very same method.
Right now, the interface is rather crude. If you get no specific
messages, that means that your code did verify.
As a consequence of the way it works, it may discover unreachable
code in the instruction sequence. I have not seen javac-generated
code with unreachable instructions, but several other compilers
produce them all over. In specific, IBM's Jikes compiler does.
When the verifier discovers such a sequence, it will print a
message like
unreachable in fooBar(): 3417+4, 7732+2
Where the +X means how many bytes onward in the byte sequence to
find the unreachable code.