This is the mail archive of the
mailing list for the GCC project.
Decompiler Project and Mailing List
- To: binutils at sourceware dot cygnus dot com
- Subject: Decompiler Project and Mailing List
- From: Lynn Winebarger <owinebar at free-expression dot org>
- Date: Wed, 17 Nov 1999 13:40:53 -0500 (EST)
- cc: gcc at gcc dot gnu dot org, guile at sourceware dot cygnus dot com
Please note I'm crossposting this announcement to the developers' lists
I think contain the people most likely to be interested in this project.
My little decompiler project (a sub-project of the Free Expression
Project) now has a mailing list, firstname.lastname@example.org.
Subscribe by mailing to email@example.com a message containing
I also have put the current code in my CVS repository, which you can
cvs -d :pserver:firstname.lastname@example.org:/usr/local/cvsroot login
password is "freeguest", and you just need to checkout the decompiler
The current state of the code is that I can translate a i386 executable
into a Scheme representation (tagged lists, mainly because it was easy to
implement and guaranteed to be portable to any scheme you want - I'm using
GUILE) of its symbols, data, and disassembled code (using the objdump code
as a base for the disassembling logic). Then I have a converter from i386
assembler to an RTL like representation (main difference - machine modes
are replaced with a more precise typing system), which works for a
reasonable subset of i386 assembly (integer math, control flow, regular
bitwise operations - basically no floating point/MMX/SIMD conversion).
I'm now working on an abstract interpreter to build a control flow
graph (allowing for bizarre jumps between function contexts), derived from
the compiler theory used for functional languages, since they have the
same problem as assembler that code is easily treated as data. After
that, there will be loop detection, and any goto's that are left will be
translated into a letrec structure (where tail-recursion optimization is
assumed) and then an attempt to "de-tailize" them will be made. Also,
stack frames won't be explicitly recognized by the analysis, instead
frames will be treated as a structure passed to function through the bp
register. I hope (assuming this works) that putting in place this general
kind of analysis will allow the detection of explicit stack constructs
(for example, look at the bison.simple parser) without any extra work.
PS My apologies to those who've read this before.