Middle End Lisp Translator (MELT)
This page documents the MELT branch (which I previously called Basilys) - see the paper in the 2007 GCC Summit Proceedings, Multi-stage construction of a global static analyser by Basile Starynkevitch, pages 143 - 152 and the MELT paper in the GROW09 workshop (inside Hipeac09).
Recent slides are here (MELT slides november 2009).
This work is motived by static code analysis.
The MELT branch contains several (related) stuff. Everything can be enabled or disabled at GCC configure time or at GCC run time:
- a Lisp dialect compiled into C code, with which one can code middle end passes
- a runtime which extends the GCC infrastructure to support the previous items, in particular a generational copying garbage collector well suited for the lisp dialect above, which is build above the existing GGC (which deals with old values).
All this work was (partly) funded by the french Ministery of Economy ... thru ITEA within the GlobalGCC project.
The MELT branch has been created on february 19th 2008 and you can get it for readonly with svn checkout svn://gcc.gnu.org/svn/gcc/branches/melt-branch/ and for read-write (assuming you have an account) with svn checkout svn+ssh://yourusername@gcc.gnu.org/svn/gcc/branches/melt-branch/ (replacing yourusername with appropriate login).
My (Basile Starynkevitch's) contributions to GCC are covered by the copyright transfer signed by CEA to FSF, reference RT306238 which I have announced here
The MELT branch follows quite closely the GCC trunk (which is usually merged more than weekly into MELT).
Since july 15th 2009, MELT can be (perhaps painfully) built and used as a plugin to an unmodified GCC (trunk or future 4.5). The MELT tutorial page explains how to build and use it in that fashion.
MELT motivation
MELT is a high-level Lisp-like language designed to fit very closely in GCC internal representations, thru an automated translation of MELT code into GCC specific C code, compiled by a C compiler (usually some GCC) and then loaded as plugins. This enables easier development of high-level static analysis and transformation, working on GCC middle end representation (GIMPLE tuple).
summary of motivations
The motivations are detailed in the GCC Summit 2007 papers; in a few words
- Coding passes in a LISP dialect is more fun and easier to the human developer.
- Some of these MELT passes (related to static analysis) are expected to run for a very long time. These peculiar passes are very rarely run (and only explicitly).
- It is worthwhile (and in the spirit of Lisp) to generate MELT/Lisp code during such very time consuming passes. So some passes might profit of dynamic code generation (at the meta-level) during them.
- Hence the MELT infrastructure should be able to generate some (specialized) code (as C files), to compile it into a dynamically loadable stuff (e.g. *.so shared objects on Linux/ELF; or *.la file with libtool), and to dynamically load it (all this during the same peculiar cc1 execution).
The MELT branch should generate C code during cc1 execution (the C code is translated from LISP internally) - it is important that it happens during execution of the cc1 process (because the whole idea is to be able to generate and then execute code during some very time consuming MELT passes). This C code is compiled into a dynamically loadable stuff (usually a *.so) and dynamically loaded by dlopen (see function compile_to_dyl in file gcc/melt-runtime.c in the MELT branch).
MELT Lisp dialect
The Lisp dialect is a Lisp1 (so more a Scheme than a Common Lisp w.r.t. names and bindings) dialect able to handle both boxed and some unboxed values. You can define primitives (which get translated to C), for example the (unboxed) integer addition is already defined as (defprimitive +i (:long a b) :long "((" a ") + (" b "))") the first :long occurrence describes the types of the formal arguments a and b, the second occurrence describes the result. There is an minimal object system (single-inheritance hierarchy, rooted at CLASS_ROOT. S-exprs are objects. Values can be MELT objects, closures, vectors, lists (a linked list of pairs, with pointers to first and last pair), pairs, etc... boxed integers or boxed GCC trees or hashtables (of objects, of trees, ..) etc... Every value has a discriminant (a MELT object) which is its class for objects. Adding support for other GCC datatypes is very easy. Tail-recursion is not handled (looping should use the forever keyword, and loops are can be exited). the let keyword is like let* in Scheme (binding sequentially).
MELT compiler implementation
It should be stressed that the MELT compiler (actually inside cc1) is translating MELT Lisp dialect S-exprs (either from a file or in memory) into a (huge) C file, which is usually compiled into a dynamically loadable stuff and then dynamically loaded (all in the same cc1 process). The MELT compiler is written in MELT (files warmelt*.melt), and a CommonLisp variant (file contrib/cold-basilys.lisp for Clisp) was coded to bootstrap it.
The MELT compiler (see function compile_list_sexpr in warmelt-outobj.melt) proceeds in several steps.
a list of S-exprs (these are MELT objects of CLASS_SEXPR) is given, either by parsing a MELT file or because it is in memory. An initial environment is also given (could be empty for the particular case of warmelt-first.melt).
the first step is macro-expansion. Every S-expr is macro expanded, usually into some subclass of CLASS_SRC (it is a source element); for example the if MELT keyword occurrence expands to an instance of CLASS_SRC_IF but most S-expr are just plain function or primitive applications.
Then a normalization phase occur. Each source element is normalized (into a subclass of CLASS_NREP) by adding additional let bindings. For example (F (G X) 2) becomes something equivalent to (LET ( (GG (G X) (FF (F GG 2)) ) FF) where FF and GG are fresh.
Then a compilation phase is called. It transforms the normalized stuff into some abstract syntax tree (in subclasses of CLASS_OBJCODE) which mimics a subset of C.
- At last, this forest of OBJCODE-s is pretty-printed as C code.
The generated C code is compiled into a shared object which is dynamically loaded (as any plugin). The C code should be compilable without the GCC source or build directory (once this GCC has been installed) because included files for the C code plugin would be saved elsewhere (e.g. in some melt-private-include/subdirectory). This is discussed here
current state
You need, in addition of all libraries used by GCC trunk (like mpfr and gmp):
the libtldl development (ie with headers) library from the libtool dynamic loader - libtldl is a dlopen replacement to dynamically load code at runtime.
The gdbm development library. Most Linux distributions have a package for it (take the development pavkage e.g. libgdbm-dev on Debian).
The Parma Polyhedra Library (PPL). It is also needed by Graphite, recently (end of August 2008) merged into trunk. You need at least PPL version 0.10.2 or a very recent GIT snapshot. PPL 0.10 won't work, so you probably would need to build PPL from its source.
- Essentially a fairly recent GNU/Linux system. I don't know anything else. Maybe it might later work on some other systems. I'm using Debian/Sid or Debian/Lenny on AMD64. Some nice guy (named Rob) was able to build MELT on a Solaris x86 system (but I needed to correct some linuxisms in the code).
Some significant amount of RAM (because the generated warmelt*.c (or warmelt*-0.c) are huge and contain a big, but simple, routine). I have a 4Gb RAM machine.
A fairly recent version of GTK to compile the contrib/simple-probe.c
Current (may 2009) state (quite messy, notably for building):
* the configure.ac should be usable, but you have to rebuild all configure files (see the Regenerating_GCC_Configuration page for details) and re configure with --with-ppl . Currently I (Basile) use the following configure command: $GCCTOP/configure --enable-maintainer-mode --enable-checks=tree,gc --enable-languages=c --disable-bootstrap --disable-multilib \
--with-ppl=/usr/local/
Regenerating the configure file is necessary because the configure file is not svn commit-ed on the branch (to avoid conflicts when merging with the trunk).
* the runtime (including a copying generational garbage collector) should be usable (but was painful to debug) files gcc/melt-runtime.c and gcc/melt-runtime.h
- and a melt/README-MELT
* Since rev136289 MELT has bootstrapped: the generated files warmelt*.c are compiled into warmelt*.so plugins which are able to generate the very same warmelt*-0.c from melt/warmelt*.melt. Hence the files warmelt-*-0.c are in the SVN repository but they are quite big. rev136409 corrected a bug (crashing on x86/32 bits).
The generated gcc/warmelt-*-0.c files are quite big, and generated by translating gcc/melt/warmelt-*.melt (Lisp MELT) files. You probably need at least 2Gb of RAM to have them compiled with gcc -fPIC -O2 into gcc/warmelt-*.so files in the build directory. If you have less memory, consider compiling them without optimisations (ie -O0), e.g. by editing some *melt-cc-script shell script, or by compiling or configuring gcc with CFLAGS=-O0.
tiny tutorial about MELT
A very incomplete tutorial about MELT. Working knowledge of Scheme, CommonLisp (or at least EmacsLisp) is supposed.
See also preferably the MELT tutorial page which explains how to use and code in MELT.
MELT data
MELT data come in two flavors.
- unboxed data - this data is not a MELT value; typical examples are:
:long (the C long integer);
:tree (the naked GCC tree structure pointer, as defined in tree.h);
:cstring (constant C strings, outside of any heap or stack);
- boxed data are MELT values allocated in the MELT heap. In particular:
- MELT objects
- MELT integers are boxed integers.
- MELT trees are boxed GCC trees.
- MELT string maps (hash tables associating strings to MELT values)
- MELT object maps (hash tables associating MELT objects to MELT values).
- MELT tree maps (hash tables associating GCC tree-s to MELT values). This is the preferred way to associate information to GCC trees.
- MELT tuples are sequence of MELT values.
- MELT closures.
MELT lists are single-linked lists which keep the first & last node of the list,
- MELT pairs are like cons-es in LISP. MELT lists' nodes are MELT pairs.
etc. See file melt-runtime.h for details.
Each MELT value (i.e. boxed data) is allocated by the MELT generational garbage collector. The MELT object system has single inheritance, a tree (not a forest) of classes, rooted at CLASS_ROOT. Notice that many MELT values (e.g. boxed integers or trees, tuples, closures, object maps ...) are not MELT objects. MELT classes and fields and method selectors are themselves MELT objects, so the object system is somehow reflexive. See CLASS_CLASS for the class of classes and CLASS_FIELD for the class of fields.
Every MELT values have a discriminant (the first field in their C struct). This discriminant is a MELT object; for MELT objects, it is their class. For MELT values which are not objects (like MELT trees, MELT object maps, etc..) it is a MELT object instance of CLASS_DISCR. Every discriminant (including MELT classes) have a dispatch table associating selectors to closures (the method implementation). Hence, method addition is very dynamic and can occur at runtime.
Every MELT object has not only a class, but also a fixed hash code and a magic number (unsigned short). For discriminants (& classes), the magic number is used (even by GGC & MELT garbage collectors) to discriminate the actual structure of the data. In particular, one can have several discriminants for MELT strings sharing the same magic, OBMAG_STRING. Hence, we have DISCR_STRING for most strings, and DISCR_VERBATIMSTRING for some strings handled specially.
The nil MELT value, represented by a NULL C pointer, has DISCR_NULLRECV as its discriminant. So it is possible to invoke (via a MELT selector) a MELT method on the null MELT value.
Be careful about the boxed vs unboxed data.
MELT functions
MELT functions take arguments and produce results. The first argument is a MELT value (it cannot be an unboxed data). Other arguments can be either MELT values or unboxed data. The first result of a MELT function is a MELT value, but additional results can be unboxed. It is expected that each MELT function (and selector) has an implicit, but well defined, signature describing its arguments and results. There is no variable argument facility in MELT.
Here is a sample MELT function repeat-times applying a function f to an argument x some specified n times. f and x are values, n is an unboxed long argument.
(defun repeat-times (f x :long n) ; n is an unboxed formal argument of type long
(forever reploop ; infinite loop called reploop
(if (<=i n 0) ; test if the unboxed n is negative
(exit reploop)) ; exit the loop if yes
(f x) ; call f
(setq n (-i n 1)))) ; decrement nThe syntax of defun is (defun function-name formal-argument-list body... ). The formal-argument-list contains either formal arguments names, or type keywords like :value :long :tree etc. A type keyword applies to all following formal arguments (till end of list or other type keyword). The body of a defun is a sequence of zero, one, or more expressions evaluated sequentially.
MELT functions can be anonymous. For instance, assuming (say s) has the side effect of displaying the value s and evaluates to nil, and (make_stringconst discr_string "hello") evaluates to a boxed string value, the following code would say hello ten times.
(repeat-times (lambda (msg) (say msg) (newline)) (make_stringconst discr_string "hello") 10)
MELT primitives
A primitive is something which expands into C code. Primitives are defined using the defprimitive construct. Their invocation expands to C code. For instance the long integer substraction -i is defined by
(defprimitive -i (:long x y) :long "((" x ") - (" y "))")The syntax is (defprimitive primitive-name formal-argument-list result-type expansion ) without any restriction on the formal argument list (the first formal can be unboxed). Expansions in primitive are strings (verbatim C code fragment) or formals. The expansion can be given more simply as
(defprimitive -i (:long x y) :long #{($x)-($y)}#)The characters sequence between #{ and }# (we call that a macro-string) is expanded by the MELT lexer into an s-expr of symbols (id prefixed with a dollars) and strings (all the rest).
Once defined, a primitive is invoked syntactically as a function.
As with C macros, and for similar reasons, it is advised to add parenthesis in primitive expansions.
MELT citerators
A C-iterator is something expanding into a C loop, (e.g. a for loop). This construct permits description of arbitrary C loop-like instructions. C-iterating expressions (whose operator is a C-iterator) are used for side effects. Look into melt.texi for more.
See also
The GCC_Plugins page.
The MELT tutorial page and then the writing a pass in MELT page, and also the memory management in MELT page.
The MELT branch is often merged with the GCC trunk (usually more than twice a week).
Some examples (I will copy them soon here) are given in this post on Lambda The Ultimate
A reference documentation has been written (in gcc/doc/melt.texi file). Please read it. Tutorial is still missing.
A graphical schema is attached here.
FAQ and bugs
0. required resources You need at least 2.4Gbytes of disk space for the source tree, nearly 2Gbytes of disk space for the build tree (so about 5Gbytes of available disk space to build GCC MELT), an existing GCC compiler to built the GCC MELT [e.g. the GCC from a recent Linux distribution]. It is strongly recommended to have a significant amount of RAM (4 Gbytes is recommended), because the generated C files from MELT are very big and stress considerably the GCC compiler used to compile them. If you have less RAM, consider building with -O0, or edit the gcc/built-melt-cc-script script in the build tree appropriately. I strongly suggest having 4Gb RAM which is cheap today.
1. I am not able to compile the branch? Please check that you have installed all the dependencies and that you did regenerate the configure file before running it.
2. I am getting melt-private-build-include/tm.h:8:35: error: config/i386/biarch64.h: No such file or directory You have been hurt by a bug: reconfigure with --disable-multilib ; thanks to Albert Cohen from INRIA Futurs to spot that. Feel free to suggest improvements. Look for run-melt-deps target in gcc/Makefile.in and the comments above. So multilib (e.g. support of both 32 & 64 bits on a AMD64) does not work yet.
3. You really should configure with --with-ppl explicitly. The --enable-melt configuration option does not exist anymore: the MELT branch has always MELT features enabled.
4. Parallel build using make -j e.g. make -j2 can fail. It may occur that, at some point the make process runs into an infinite loop. If that happens, kill the wild make process, and run it again serially.
5. What is the relation with plugins? MELT is (painfully) usable as a plugin. But the documentation and MELT's own bootstrap still requires the MELT branch.
6. I am getting some PPL related error when building MELT. Check that you have a recent PPL library. You definitely need PPL 0.10.2 or better (i.e. future PPL 0.11 or a git snapshot).