This is the mail archive of the
mailing list for the GCC project.
Re: Testing GCC & OSDL
On Fri, 17 Sep 2004, Laurent GUERBY wrote:
> On Fri, 2004-09-17 at 10:43, Joseph S. Myers wrote:
> > A regression tester that tests every mainline commit rather than batching
> > them would be feasible on a small cluster of machines (about 30 commits a
> > day to mainline, a bit more if you want to test release branches as well),
> BTW, is there a script that assumes a local CVS repository (rsync'ed)
> and gives you the list of CVS dates in between commits? (so that
> "cvs co -D X" for X in this list gives the interesting list of sources)
Watching gcc-cvs was the most obvious method that occurred to me for such
a tester to identify changesets. Things would be easier with a version
control system that actually had changesets, but I don't think this is a
major obstacle if the machines were there to devote to testing every
commit (covering *every* branch would take about twice as much resources
as just mainline plus release branches); you can approximate changesets
well enough from CVS. For extracting past changesets, scripts to convert
existing repositories to different version control systems (e.g. the one
that converts CVS to Subversion) may be of use, though I think too slow to
use in a realtime tester.
> 1. keep only one out of N so you still speed up the binary search but
> need to finish by building a few compilers
We do indeed effectively have something as good as this - Phil's
Regression Hunter <http://www.devphil.com/~reghunt/> tracks problems down
to a daily build and the scripts in contrib/reghunt then serve to do a
binary search (on timestamps rather than changesets) to locate the
individual patch that caused a problem, and these methods get used to fill
out regression bugs with details of what caused the regression.
Keeping builds for every changeset would be a minor refinement to speed up
the process of locating the cause of a newly found regression that may
have been there for a while.
Having regression testers test every changeset as it is made and report
regressions within a few hours would mean that people are told of
regressions their patches have caused while the patches are still fresh in
their minds, rather than getting a regression report covering many patches
at once and suspecting it's probably someone else's patch. (Checkins
while there's a build failure cause trouble in this regard, but enough
regressions appear while the tree stays building throughout that I think
the principle's still useful.)
> 4. compress and/or use binary deltas (documentation, language, platform
> and library patches will affect only one part of the installed files),
> but you will have to uncompress before use (not much of a problem given
> CPU/disk performance ratio these days).
I think this would be a useful practical approach - e.g. keep one in 32
builds (so approximately one a day for mainline), keep those half way
inbetween as binary deltas from those 16 before, etc., so a regression
test would finish with copying an endpoint tree and applying a delta to
get a midpoint tree to test (five times). The 1.5GB disk writing involved
could be saved (reduced to 0.3GB reading of the startpoint tree) by using
a big enough filesystem in RAM.
Experiment would be needed to determine how much disk space is needed to
store trees in deltas in this fashion, and how fast regression testing
could be with this approach.
Joseph S. Myers http://www.srcf.ucam.org/~jsm28/gcc/
http://www.srcf.ucam.org/~jsm28/gcc/#c90status - status of C90 for GCC 4.0
email@example.com (personal mail)
firstname.lastname@example.org (Bugzilla assignments and CCs)