writing custom JVM, wanting people to talk to...

Wed Oct 15 09:21:00 GMT 2008

well, ok, don't know if this is off-topic, but oh well...
the text below is from an email I had written, and as noted, my general 
point is to be able to hopefully talk to anyone that might be interested in 
talking.

---

well, I don't know if you would be interested in talking any.

basically, I started writing my own JVM (mostly for my own uses), but would
also like some people to talk to about technical stuff.

in particular, I am considering the possibility of 'extensions' to the
bytecode, since I am actually intending to use it for more non-Java usages,
and felt a need for things like proper dynamic typing, and also support for
pointers.

I am actually starting from the position of already having an existing VM
framework (my case, around 200+ kloc of mostly C), so it is more like I am
adding a "Java module", rather than starting clean (actually, the actual JVM
part thus far is likely to be tiny, and even if I add JIT, I will mostly
just be using stuff I already wrote for more general-purpose dynamic
compilation).

I am also writing my own so that I have good integration with my existing
machinery, ... which would not so likely happen if I used a pre-existing VM.

I also hope if possible to utilize GNU Classpath and GCJ and similar, which
is why I had not just created my own entirely custom bytecode (actually,
this would have been less effort, since I would not have had to gone through
the effort of figuring out how the JVM works, but I have done all this too
often before...).

now, as noted, my existing framework is based mostly around interfacing
directly with C land. it is vaguely like the JVM or .NET, but for the most
part operates more at the level of machine code (x86/x86-64 only at present,
although my dynamic C compiler is not complete for x86-64), but differs from
LLVM in many subtle ways, in particular, in that many of the focuses and
internals are different. I focus on generated code flexibility and cleanly
interfacing with external code/libs (in particular, that generic C APIs can
be "assimilated"), whereas LLVM is IMO not as good in these areas (focusing
more on generation of high-performance more-or-less self-contained code).

it also lacks any central bytecode or interpreter (thus far), the point
being that I felt there is a need for some form of bytecode, and Java
ByteCode seems like the most plausible option in my case (at least as a
baseline or starting point).

it is much like why a person would prefer to use JPEG or PNG over some
custom image format (or write a C compiler as opposed to just making up
their own language).

common format, some common tools, ...

for example, I use JPEG and PNG, but in both cases my own implementations of
them (in particular, both formats are simple enough to where decently good
single-source-file implementations are possible, which saves one from
external dependencies on the likes of libjpeg and libpng...).

there is a risk of a possible integration issue, namely that Java tends to
exist more as its own "island" more than a little different from C-land,
which makes possible C/Java interfacing, at least, more than a little wonky
(I think it can be done though).

of course, granted not everything will be visible from Java (unless I feel
like writing my own non-standard compiler). some alternative means of
interfacing is needed (one possibility is a vaguely C#-like language which
can at the same time see both Java-land and C-land). personally, I find both
JNI and JNA offensive, but may use them if needed (idle thoughts for
alternatives exist...).

I also have a partial ECMAScript implementation thrown in the mix as well,
which I also intended to move over to using the same bytecode as such a JVM
(I have a pre-existing bytecode here, but will probably abandon it in the
conversion, me thinking the use of JBC would probably give superior
performance anyways, as well as providing a clear path for later
optimization).

my VM is not particularly "centralized" either, more it can be compared to a
good number of cooperating modules interacting through more-or-less defined
APIs.

however, doing such things it makes sense to have some communication with
any sort of "authorities" in the field (especially since my considered
alterations could go slightly outside of Sun's boundaries, but at this point
in time should not make compatibility issues), and also since the people at
Sun have thus far not said anything back to me (either they did not notice,
or they think my thinking is just too damn stupid to waste their time
replying...).

in particular, for my usages it would make sense to have an extension
mechanism that is not impdep1 and impdep2 (since these are by definition
purely implementation dependent).

instead, I have considered the possibility of using opcodes 248-253 to
represent a secondary "open" extension mechanism, namely one providing for
both extending the instruction set, AND allowing for multiple people/groups
to work independently and add opcodes without so much risk of clash (more
so, to allow an implementation to make use of opcodes developed
independently from multiple sources).

this is intended more for "research VMs" than anything else though, with the
likely idea that VMs would reject anything they don't recognize (and any
portable bytecode is likely to be confined to following the
baseline/official JVM spec). but, at the same time, if done well it could
provide a means for features to be "borrowed" between VMs, without each
party incompatibly re-engineering the same stuff and only agreeing as to the
Sun-specified "baseline".

now, of course, something like this is only useful if at least some people
can cooperate as to its existence and use.

now, something like this would not likely clash with existing
implementations, since this bytecode range is unused, and it does not impact
any prior use of impdep1/impdep2, which I will presume many VMs already make
use of...

but, it would probably clash with the JVM spec, which likely assumes that
3rd parties do not add any bytecodes in the shared/common space (single-byte
opcodes).

another possibility, if it comes to this, is to actually "branch" my
bytecode, and create something technically distinct from Java's bytecode
(would need to come up with some other name), but have it "just happen" to
be more-or-less JVM compatible...