This is the mail archive of the java-discuss@sourceware.cygnus.com mailing list for the Java project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Interpreter progress...


So today's status...

Now I have it up and running a tad more, I can now run my verifier on
top of the interpreter.  So it can at least run for the ~10 minutes it
takes to check the entire jdk with no fault.  

I spend most of my time yesterday hunting down a couple of gc-bugs,
... things like the storage for static variables was disappearing
under my nose after a while.  The Boehm collector really is quite
precise!

As for performance, interpreter-only code is approximately 1/5th of
jdk 1.1.7 on my Pentium Linux portable, whatever that means... It's
slow anyway.  I don't know if that version is running on a JIT or
what?  This time, I'm compiling everything -O2 :-)

Right now it jumps through a gffi-call for every method called,
(including interpreted ones).  This of cause is nice and uniform, but
it might well be where the time goes.  It's hard to get a precise
profile, the interpreter main loop function just eats up most of the
time... :-(  Is there such a thing as basic-block profiling? 

Anyway, as an experiment, I replaces some chained
if-else-if-else.. sequences on the critical path (of setting up
gffi-calls) into switch statements, and the whole thing became 15%
slower!  I didn't expect that, but non-the-less, there is definitively
being spend a lot of time doing these calls.

I'm beginning to think, that maybe interpreted classes should have an
extra vtable that the interpreter can use to call other interpreted
methods in an efficient way. Ideas?  Can I put in vtable->mehtod[0],
or why is that slot allways empty?

Today, I think I'll look into adding stub support to libffi for x86.
It seems like that way, stub support is more likely to be integrated
in the main stream, and be supported in the future, eh?  It works fine
with gffi right now, but using __builtin_apply is not exactly
efficient, since the arguments needs to be copied one too many times.

I'm thinking of adding the following public API to libffi:

 typedef struct {
    char tramp[FFI_TRAMPOLINE_SIZE];
    ffi_cif *cif;
    void    *data;
 } ffi_stub;

 ffi_status
 ffi_prep_stub  (ffi_stub *stub,
                 ffi_cif *cif,
                 void *data,
                 void (*fn) (ffi_cif* cif, void *rvalue, 
                             void **avalue, void *data))

 STUB is some word-aligned memory allocated by the caller, minimum
 sizeof (ffi_stub) bytes.  After the call to ffi_prep_stub, this
 memory address will be applicable as a function taking the arguments
 described in CIF.

 When the STUB is called, some intermediate code will setup an array
 of pointers to arguments, and allocate space for the return value (if
 necessary), and finally, it will call FN.  FN will receive the
 arguments CIF (the original cif), RVALUE (a pointer to the location
 into which to store the result), AVALUE (an array of pointers to
 locations holding the arguments) and DATA (some user defined data that
 the callee can use to determine what do do about this call.)

GFFI has basically the above API now, and I found it to be quite
useful.  In particular it is important that the caller is the one to
allocate the stub area.
 
In case of the interpreter, I want to use _Jv_AllocBytes to allocate
the stub, and to store the CIF (including argument types) in the same
memory blob.  I.e., I want it to be

  struct interp_stub {
    ffi_stub the_stub;
    ffi_cif  the_cif;
    ffi_type the_args[n_args];
  } x;

such that if the garbage collector can see the beginning of this
structure, it will keep it all.  (And it will, since I'm marking the
ncode entries...).  At some point, it may be worth having a hash table
of the "cif+args" values, since many will be the same.  In my
interpreter right now, A resolved CONSTANT_MethodRef also includes an
entire a "cif+args". Ooops, .. back on track:

 INTERNAL API

 The internal api will involve a macro much like INIT_TRAMPOLINE in
 gcc, and something similar to ffi_prep_args, to set it up for
 incoming args.

 On some architectures, we may need to do special tricks to make the
 data passed in be executable; but some machines may need to have
 special support, ala __enable_execute_stack in libgcc2.

Let me know what you think.

C.U.

-- Kresten

 Kresten Krab Thorup, Ph.D. Candidate
 c/o Yonezawa Laboratory
 Department of Information Science   
 The University of Tokyo             
 7-3-1 Hongo, Bunkyo-ku, Tokyo 113 Japan
 Fax: +81-(0)3-5689-4365	 
 Phone: +81-(0)3-5841-4118
 Mobile: +81-(0)90-3693-5715


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]