This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
March gcc 3.0 and 3.1 Bootstraps Fail 34% of Time

To: gcc at gcc dot gnu dot org
Subject: March gcc 3.0 and 3.1 Bootstraps Fail 34% of Time
From: Jeffrey Oldham <oldham at codesourcery dot com>
Date: Fri, 30 Mar 2001 10:32:13 -0800
cc: oldham at codesourcery dot com
Reply-to: oldham at codesourcery dot com

INTRODUCTION:
Both CodeSourcery, LLC, and I perform nightly gcc builds, downloading
the most recent gcc 3.0 and 3.1 code about 08:00 GMT daily and
bootstrapping.  The two charts below indicate failure to bootstrap and
then run tests, giving no indication of number of regression test
failures.  (It is rare for bootstrapping to succeed but testing to
fail.)

Builds of the prerelease gcc 3.0 are listed before builds of the
development gcc 3.1.  Three configurations are listed for gcc 3.0, but
only two are listed for gcc 3.1.  A blank entry indicates that
bootstrapping and running the regression tests finished, giving no
indication of how many tests succeeded or failed.  At the bottom of
each chart, the approximate percentage of bootstrapping successes is
listed.

GCC 3.0 i686-pc-linux-gnu	i386-pc-linux-gnu	mips-sgi-irix6.5
Mar01							
Mar02							
Mar03							
Mar04							
Mar05							
Mar06							
Mar07							
Mar08							
Mar09							
Mar10	failure						failure
Mar11							
Mar12							
Mar13							
Mar14				failure			
Mar15				failure			
Mar16							
Mar17							
Mar18							failure
Mar19							failure
Mar20							failure
Mar21							failure
Mar22				failure			failure
Mar23	unknown						failure
Mar24				failure			
Mar25							
Mar26							
Mar27							failure
Mar28	failure			failure?		failure
Mar29	anoncvs.cygnus down	anoncvs.cygnus down	anoncvs.cygnus down
Mar30	failure			failure?		failure
	-------------------------------------------------------------------
	87% success		77% success		63% success

GCC 3.1	i686-pc-linux-gnu				mips-sgi-irix6.5
Mar01							
Mar02	failure						failure
Mar03	failure						
Mar04	failure						
Mar05	failure						
Mar06	failure						
Mar07	failure						
Mar08	failure						
Mar09	failure						failure
Mar10	failure						failure
Mar11	failure						
Mar12	failure						
Mar13							
Mar14							
Mar15							
Mar16							
Mar17							
Mar18	failure						failure
Mar19							failure
Mar20							failure
Mar21							failure
Mar22							failure
Mar23							failure
Mar24	failure						
Mar25							
Mar26							
Mar27	failure						
Mar28	failure						failure
Mar29	anoncvs.cygnus down				anoncvs.cygnus down
Mar30	failure						failure
	-------------------------------------------------------------------
        43% success					60% success

INTERPRETATION:
One way to interpret the success percentages is "If I, as a gcc user
or gcc developer, download gcc at some random time in March, what is
the probability that it bootstraps."  Although it is possible that the
gcc tree is more likely to be broken (or fixed) about 08:00 GMT, I
believe that implausible.

Since most gcc developers use i686-pc-linux-gnu, I conjecture its
probability of success represents an upper bound on other platform's
success.  The gcc 3.1 data does not reflect this because the sequence
of i686-pc-linux-gnu failures during the early part of the month
reflect including Java in bootstrapping and testing for the first
time.  This was not turned on for mips-sgi-irix6.5 for some period of
time.  The gcc 3.0 data does reflect this conjecture.

Interestingly, i386-pc-linux-gnu builds fail more frequently than
i686-pc-linux-gnu builds for an unknown reason.

The failures can be grouped into 7 one-day failures and 10 multi-day
failures.  Of the 51 days of failures, 43 days were caused by 10
failures that were allowed to persist for more than one day.  (This
intrepretation assumes that a failure causing a multi-day failure
remains until the failures end.  Even if the failure originating a
multi-day failure is fixed but replaced by another failure which
extends the sequence, it can be argued that the subsequent failure
might not have been introduced if the original failure had not masked
it.)

MY CONCLUSIONS:
Prerelease gcc 3.0 is supposed to be stable with minor changes.  Thus,
bootstrapping downloads from random times should succeed with 99%
probability.  Changes to this code are supposed to represent small
monotonic improvements that are bootstrapped and tested.  Improving
the code depends on successfully bootstrapping and testing so each
failure delays further improvements.  An 87% success rate means a 13%
failure rate.  One calculation indicates these failures delayed
release by 100%/87% - 100% = 15%, i.e., almost one workweek of delays.

The high failure rates of 57% and 40% for gcc 3.1 indicate that either
1) bootstrapping and checking of code changes is not being performed ior
2) patches that break code are not being removed quickly enough.
The cost of these failures include
a) introduction of other errors that are masked by the initial failures,
b) slowing of development because no bootstrapping can occur,
c) wasting of time searching for these errors, and
d) alienation of GCC customers by broken code.

Notice of code breakages is being lost among other messages in
gcc-bugs@gcc.gnu.org postings.  Also, tracking these breakages is
difficult.  Failure to bootstrap or finish testing because of an
unknown cause could be more effectively tracked by a separate WWW
site.  The site would contain postings of failures sorted according to
gcc 3.[01] x configurations.  It would be important to note when the
failures cease.  Using this information, a developer could easily
discern if her failure is the same as that already found.  When it is
discovered that a patch causes a failure for some configuration, it
would be easy to point the patch submitter to that configuration's
failure.

The GCC Steering Committee should adopt a desired rate of successful
bootstrapping and testing to facilitate code correctness and
development.  If the GCC community agrees with this decision,
processes to ensure the rate is met will evolve and then be adopted by
the Steering Committee as policy.

CAVEATS:
1) Although these tests are automated, humans still interact
with them, occasionally causing problems.
2) I collected the data by hand, further increasing the probability of
errors.
3) These comments reflect my own views, not CodeSourcery's views, and
have not been reviewed by or discussed with anyone else at
CodeSourcery.

SUMMARY:
The GCC community needs to work harder to develop a product that works
first time and every time.  We are a long way from achieving at least
one 9 of reliability, much less five 9's.

Hoping for improvement,
Jeffrey D. Oldham
oldham@codesourcery.com
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]