This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Reconsidering gcjx


Now that the GPL v3 looks as though it may be EPL-compatible, the time
has come to reconsider using the Eclipse java compiler ("ecj") as our
primary gcj front end.  This has both political and technical
ramifications, I discuss them below.

Steering committee members, please read through if you would.  I think
this requires some resolution at the SC/FSF level.

First, a brief note on gcjx.  I had intended gcjx to serve not only as
a cleanly written replacement for the current gcj, but also as a model
for how GCC front ends should be written in the future; in particular
I think writing it as a library and separating out the tree-generating
code from the bulk of the compiler remain good ideas.  I enjoyed, and
continue to enjoy, the writing of gcjx.  However, in this case I think
that pleasure must give way to the greater needs of efficiency and
cross-community cooperation.


Motivation.

The motivation for this investigation is simple: sharing code is
preferable to working in isolation.  In particular this change would
let us offload much of the front end maintenance onto a different
group.

Ecj has a good front end (much better than the current gcj) and decent
bytecode generation.  It is fully 1.5-compliant and, apparently, is
tested against the TCK by the upstream maintainers (us gcj developers
don't have TCK access).  It also has some improvements for 1.6 (stack
maps).  Upstream is very active.

gcjx by comparison is unfinished and really has just a single
full-time developer, me.


Technical approach.

Historically we've wanted to have a 'native' java-source-code-reading
compiler, that is, one which parses java sources and converts them
directly to trees.  From what I can remember this was based on 3
things:

* In the past the compiler handled loops built with LOOP_EXPR better
  than it handled loops built "by hand" out of GOTO_EXPRs.  My
  understanding is that this has changed since tree-ssa.  The issue
  here was that we made no attempt to rebuild a LOOP_EXPR from java
  bytecode.

* The .java front end could do a "constant array" optimization.  This
  optimization has not worked for quite some time (there's a PR).  In
  any case we could implement this for bytecode if it matters.

* The .java front end could more efficiently handle class literals.
  With the new 1.5 'ldc' bytecode extension, this is no longer a
  problem.

In other words, as far as I can remember, our old reasons for wanting
this are obsolete.

I think our technical approach should be to have ecj emit class files,
which would then be compiled by jc1.  In particular I think we could
change ecj to emit a single .jar file.  This has a few benefits: it
would give -save-temps meaning for gcj, it would let us more easily
drop ecj into the existing specs mechanism, and it would require very
few changes to the upstream compiler.

An alternative approach would be to directly link ecj to the gcc back
end.  However, this looks like significantly more work, requiring much
more hacking on the internals of the upstream compiler.  I suspect
that this won't be worth the effort.

In my preferred approach we would simply delete a portion of the
existing gcj and turn jc1 into a purely bytecode-based compiler.  Then
we would proceed to augment it with all the bits needed for proper 1.5
support.

ecj is written in java.  This will complicate the bootstrap process.
However, the situation will not be quite as severe as the Ada
situation, in that it ought to be possible to bootstrap gcj using any
java runtime, including mini ones such as JamVM -- at least, assuming
that the suggested implementation route is taken.


Politics.

I don't know whether the FSF or the GCC SC would let us import ecj,
even assuming it is actually GPL compatible.  SC members, please
discuss.

We don't know how upstream would react.  I think this is a fairly
minor risk.

It is unclear to me whether we must even rely on GPL v3 if we went
with the separate-ecj route.  Any comments here?  In the
exec-via-specs approach we're invoking ecj as a separate executable,
much the same way we exec 'as' or 'ld'.  Comments on this from
license-oriented folks would be appreciated.


Summary.

I think this would be the most efficient way to achieve 1.5 language
compatibility for gcj, and it would also make future language changes
less expensive.  Given the scope of the entire gcj project, especially
when the scarcity of resource devoted to it are taken into account,
this is significant enough to warrant the change.

Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]