This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: java bytecode considered bad
- To: Trent Waddington <s337240 at student dot uq dot edu dot au>
- Subject: Re: java bytecode considered bad
- From: Jeff Sturm <jsturm at one-point dot com>
- Date: Wed, 21 Feb 2001 03:54:20 -0500 (EST)
- cc: gcc at gcc dot gnu dot org
On Wed, 21 Feb 2001, Trent Waddington wrote:
> Following is a dialog I have had with RMS over the last few weeks. The
> skinny of it is that RMS thinks having gcc both generate and accept as an
> input java bytecode allows folks to do nasty proprietary things with gcc
> so he's not interested in the backend for the jvm which I wrote 18 months
> ago (and doesn't think anyone else should be). I have tried to explain
> that java bytecode (especially the stuff I generate) is not a good
> intermediate language... I'll let the list handle it.
I've thought about this too. Though I don't speak for anybody but myself,
I'll share a few of my opinions on the matter.
<snip>
> One thing I am not certain of. I think I recall that the Java front
> end for GCC can easily read in Java byte codes and compile them.
> Can you tell me for certain if that is true?
It is true that gcj can both read and write Java bytecode. One difference
from your project is that gcj can only translate Java source (i.e. not
C,C++,Fortran) source to bytecode, so as it stands the implementation in
gcj is not really a general IR, it is very Java specific (alhough there
likely exist other free and non-free bytecode compilers for various
languages).
Java bytecode is important to both the gcj compiler and runtime for
several reasons:
[1] class definitions are resolved to bytecode at compile time, when
possible
[2] bytecode can be compiled to object when source is not available
[3] bytecode can be directly interpreted by the runtime when object (.o)
is not available
It is feasible to use gcj without [3] (in fact it is a configure-time
option). [1] is really a requirement, since parsing Java source for each
of the dependent classes tends to be unacceptably slow, and fails in a few
corner cases. [2] is really not different than [1] except that code
generation is performed.
My point is that [1] is analogous to implementing precompiled headers for
g++, which is a work in progress IIRC. The purpose of PCH is to reduce
compile time via a compact binary header representation, just as Java
bytecode does for classes. One could argue that the binary class
description need not contain any code, if not for inlinable (non-virtual)
member functions. So it is always true that the bytecode contains a
complete class definition sufficient to translate to object form. I
expect the same holds true of PCH, since a header file can contain any
legal C++ syntax.
It follows that if Java bytecode is considered an IR, then any PCH format
is also an IR. Even though the latter may be unpublished and change from
release to release, why doesn't the capability of reading and writing that
format present the same danger that RMS suggests of a bytecode backend?
BTW, Java bytecode isn't really a good example of an IR because it is
semantically much closer to source form (in fact gcj generates bytecode
directly from syntax trees) and most interesting optimizations (e.g.
method inlining) cannot be performed without breaking verification. It is
probably better described as a form of obfuscated source.
Jeff