This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

using delta


I was having fun with the Google "link:" searching feature looking for
people who point to my web page and I found this conversation (quoted
below) between the two of you regarding delta on the web at
http://gcc.gnu.org/ml/gcc/2004-04/msg00227.html

Geez, if you guys were having so much trouble, why didn't you let me
know!  We have used delta on programs of hundreds of thousands of
lines and it works for us; it always seems to get it down to about a
page or two of code.  I was wondering if perhaps I could offer a few
suggestions, since delta needs a little help from humans, but humans
unfamiliar with the problem sometimes don't quite realize how easily
they can help it.

First of all, the best way to use delta is to have a very strict
definition of what is an "interesting" file.  For example if you have
your own compiler which is crashing on legal C++ input, you don't just
write a test script that tests to see if the input crashes your
compiler; Instead, you write a test script to see if it is 1) likely
to be legal C++ AND 2) it crashes your compiler.  So your script
should look something like this (in bash):

  g++ file.cc && my_compiler | grep 'bad error' &> /dev/null

The first test will reject code that that isn't interesting because it
isn't legal, and the second because it doesn't crash your compiler.
Now, if my_compiler IS g++, then see if you can find an older version
that your file doesn't crash: That is, a file is interesting if the
old version likes it but the new one doesn't.

Second, be sure to use the script bin/multidelta and the '-level'
command line argument rather than just the raw bin/delta.  Internally
multidelta first uses Scott McPeak's topformflat utility and then runs
delta on that.  Topformflat takes an int 'level' as an argument and
basically does 'cat' but doesn't print newlines that are nested deeper
than 'level'.  Since delta works at the line granularity, this feature
prevents delta from digging deeper than the nesting level you set.
So, if you say '-level=0' (which is also the default and highly
recommended for the first run) delta will try to minimize just the
number of top-level forms in your program without bothering with their
internals.  Since delta also starts throwing away things from the end
of the file instead of the start, this usually rapidly gets rid of a
lot of your file.  Combined with a carefully-defined notion of
interestingness this feature keeps delta from getting stuck in the
innumerable crevasses of local optima of the often very jagged and
non-monotonic 'interesting' subset of the very high-dimensional space
of input programs.

Note also that delta also only finds a solution that is 1-minimal: no
one line can be thrown away and produce a file that is still
interesting.  This seems to be a good heuristic, but it can sometimes
get stuck in a local optimum and need a little jiggling to get out;
Therefore run delta, say, twice with '-level=0', then twice with
'-level=1', '2', and '3', and then just set it to '-level=10' one last
time.  Then run the output through 'indent'.  You can then continue to
hand minimize it by removing stuff and running your interesting test
repeatedly yourself.

Delta also has two phases: in the first it works backward from the end
to the start throwing away the trial text and testing if the remainder
is interesting; this is the most useful phase.  Being faithful to the
algorithm, I implemented a second phase where delta works forward from
the start to the end keeping *only* the trial text and throwing
everything else away and seeing if this is interesting.  This phase is
mostly useless, but terminates quickly if you have written your
interesting test script to quickly reject non-legal C++.  You should
still write your interesting test script this way, but you could also
just turn off this forward phase as it is rarely useful except in very
different applications of delta.

A few more tricks: It is easier to follow delta's progress if you send
all the output of your script to /dev/null as I did above.  You can
then get a sense if delta is making progress.  The default multidelta
runs 'cpp' on the input as well, which is a waste if you did it once
up-front, which I recommend.  Therefore, just change it to 'cp'.
Sometimes topformflat adds a line which delta just removes again, so
don't write a script to re-run delta until it isn't removing any lines
at all; you might infinite loop.

Please let me know of any bugs, feature requests, and any successes
you have with delta.

Daniel Wilkerson


> Paolo Carlini wrote:

> > Richard Guenther wrote:
> > > Oh well. If only we had a simple way to kill all unused sourcecode

> > > in a testcase to begin with...

> >  Delta?!?
> > http://www.cs.berkeley.edu/~dsw/
> > Paolo.

> Yes I know this, but it is painful (for C++) and slow. I suppose all
> the information on ever used (for lookup or instantiation) classes,
> methods, functions and variables is available during compilation and
> so extracting this information where it is available (in the
> compiler!) would be easy (at least for someone understanding the C++
> parser). Running delta for two days just to get information the C++
> compiler has in the first place doesnt sound right.

> Of course delta will assist in reducing the testcase after the
> compiler has stripped all unused code.

> Richard.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]