Middle End Lisp Translator (MELT)
This page documents the MELT branch (which I previously called Basilys) - see the paper in the 2007 GCC Summit Proceedings, Multi-stage construction of a global static analyser by Basile Starynkevitch, pages 143 - 152
The MELT branch contains several (related) stuff. Everything can be enabled or disabled at GCC configure time or at GCC run time:
- a compiler probe, which enable an advanced user to display some of the compilers internals data (but not to change them or change the GCC compiler's behavior.
- a Lisp dialect compiled into C code, with which one can code middle end passes (Currently still work in progress)
- a runtime which extends the GCC infrastructure to support the previous items, in particular a generational copying garbage collector well suited for the lisp dialect above, which is build above the existing GGC (which deals with old values).
All this work is (partly) funded by the french Ministery of Economy ... thru ITEA within the GlobalGCC project.
The MELT branch has been created on february 19th 2008 and you can get it for readonly with svn checkout svn://gcc.gnu.org/svn/gcc/branches/melt-branch/ and for read-write (assuming you have an account) with svn checkout svn+ssh://yourusername@gcc.gnu.org/svn/gcc/branches/melt-branch/ (replacing yourusername with appropriate login).
My (Basile Starynkevitch's) contributions to GCC are covered by the copyright transfer signed by CEA to FSF, reference RT306238 which I have announced here
Since all this was previously (internally) called basilys, you have a lot of names and identifiers related to basilys: the basilys.[ch] files, the *.bysl suffix This should change later.
MELT Lisp dialect
The Lisp dialect is a Lisp1 (so more a Scheme than a Common Lisp w.r.t. names and bindings) dialect able to handle both boxed and some unboxed values. You can define primitives (which get translated to C), for example the (unboxed) integer addition is already defined as (defprimitive +i (:long a b) :long "((" a ") + (" b "))") the first :long occurrence describes the types of the formal arguments a and b, the second occurrence describes the result. There is an minimal object system (single-inheritance hierarchy, rooted at CLASS_ROOT. S-exprs are objects. Values can be MELT objects, closures, vectors, lists (a linked list of pairs, with pointers to first and last pair), pairs, etc... boxed integers or boxed GCC trees or hashtables (of objects, of trees, ..) etc... Every value has a discriminant (a MELT object) which is its class for objects. Adding support for other GCC datatypes is very easy. Tail-recursion is not handled (looping should use the forever keyword, and loops are can be exited). the let keyword is like let* in Scheme (binding sequentially).
summary of motivations
The motivations are detailed in the GCC Summit 2007 papers; in a few words
- Coding passes in a LISP dialect is more fun and easier to the human developer.
- Some of these MELT passes (related to static analysis) are expected to run for a very long time. These peculiar passes are very rarely run (and only explicitly).
- It is worthwhile (and in the spirit of Lisp) to generate MELT/Lisp code during such very time consuming passes. So some passes might profit of dynamic code generation (at the meta-level) during them.
- Hence the MELT infrastructure should be able to generate some (specialized) code (as C files), to compile it into a dynamically loadable stuff (e.g. *.so shared objects on Linux/ELF; or *.la file with libtool), and to dynamically load it (all this during the same peculiar cc1 execution).
The MELT branch should generate C code during cc1 execution (the C code is translated from LISP internally) - it is important that it happens during execution of the cc1 process (because the whole idea is to be able to generate and then execute code during some very time consuming MELT passes). This C code is compiled into a dynamically loadable stuff (usually a *.so or *.dylib file, perhaps wrapped as a *.la file and .libs directory) and dynamically loaded by lt_dlopenext (see function compile_to_dyl in file gcc/basilys.c in the MELT branch).
MELT compiler implementation
It should be stressed that the MELT compiler (actually inside cc1) is translating MELT Lisp dialect S-exprs (either from a file or in memory) into a (huge) C file, which is usually compiled into a dynamically loadable stuff and then dynamically loaded (all in the same cc1 process). The MELT compiler is written in MELT (file warm-basilys.bysl), and a CommonLisp variant (file contrib/cold-basilys.lisp for Clisp) was coded to bootstrap it.
The MELT compiler (see function compile_list_sexpr in warm-basilys.bysl) proceeds in several steps.
a list of S-exprs (these are MELT objects of CLASS_SEXPR) is given, either by parsing a MELT file or because it is in memory. An initial environment is also given (could be empty for the particular case of warm-basilys.bysl).
the first step is macro-expansion. Every S-expr is macro expanded, usually into some subclass of CLASS_SRC (it is a source element); for example the if MELT keyword occurrence expands to an instance of CLASS_SRC_IF but most S-expr are just plain function or primitive applications.
Then a normalization phase occur. Each source element is normalized (into a subclass of CLASS_NREP) by adding additional let bindings. For example (F (G X) 2) becomes something equivalent to (LET ( (GG (G X) (FF (F GG 2)) ) FF) where FF and GG are fresh.
Then a compilation phase is called. It transforms the normalized stuff into some abstract syntax tree (in subclasses of CLASS_OBJCODE) which mimics a subset of C.
- At last, this forest of OBJCODE-s is pretty-printed as C code.
The generated C code is compiled into a shared object which is dynamically loaded (as any plugin). The C code should be compilable without the GCC source or build directory (once this GCC has been installed) because included files for the C code plugin would be saved elsewhere (e.g. in some melt-private-include/subdirectory). This is discussed here
current state
You need, in addition of all libraries used by GCC trunk (like mpfr and gmp):
the libtldl development (ie with headers) library from the libtool dynamic loader - libtldl is a dlopen replacement to dynamically load code at runtime.
The parma polyhedra library (ie PPL)
- Essentially a fairly recent GNU/Linux system. I don't know anything else. Maybe it might later work on some other systems. I'm using Debian/Sid or Debian/Lenny on AMD64
- Some significant amount of RAM (because the generated warm-basilys.c is huge and contain a big, but simple, routine). I have a 4Gb RAM machine.
Temporarily GNU CLISP (to cold bootstrap compile manually the warm-basilys.bysl into warm-basilys.c)
A fairly recent version of GTK to compile the contrib/simple-probe.c
Current (feb 2008) state (quite messy, notably for building):
* the compiler probe should be usable files gcc/compiler-probe.c and gcc/compiler-probe.h
* the configure.ac should be usable, but you have to rebuild all configure files and re configure with --with-ppl --with-ltdl --enable-compiler-probe --enable-basilysmelt. Currently I (Basile) use the following configure command: $GCCTOP/configure --enable-maintainer-mode --enable-checks=tree,gc --enable-languages=c --disable-bootstrap --disable-multilib \
--with-ppl=/usr/local/ --enable-compiler-probe --enable-basilysmelt
* the runtime (including a copying generational garbage collector) should be usable (but was painful to debug) files gcc/basilys.c and gcc/basilys.h
* there is a contrib/simple-probe.c file (a self contained GTK program) and a contrib/cold-basilys.lisp (for the CLISP implementetion of Common Lisp)
- and a melt/README-MELT
* warm-basilys.bysl is still buggy and cannot generate warm-basilys.c as it should. I am still working on warm-basilys.bysl and using contrib/cold-basilys.lisp to translate it into warm-basilys.c -- see the end of gcc/Makefile.in from MELT branch
* the whole GCC bootstrap procedure should be upgraded to have warm-basilys.c and warm-basilys.so regenerated and compared between stage 2 & 3; this is not done at all
See also
The GCC_Plugins page.