This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Our release cycles are getting longer


On Wed, 24 Jan 2007 03:02:19 +0100, Marcin Dalecki <martin@dalecki.de> said:

> Wiadomość napisana w dniu 2007-01-24, o godz02:30, przez David Carlton:

>> For 4, you should probably spend some time figuring out why bugs are
>> being introduced into the code in the first place.  Is test coverage
>> not good enough?

> It's "too good" to be usable. The time required for a full test
> suite run can be measured by days not hours.

That's largely because individual tests in the test suite are too
long, which in turn is because the tests are testing code at a
per-binary granularity: you have to run all of gcc, or all of one
of the programs invoked by gcc, to do a single test.  (Is that true?
Please correct me if I'm wrong.)

Well-written unit tests take milliseconds to execute: it's quite
possible to run hundreds of unit tests in a second, ten thousand unit
tests in a single minute.  (I will give examples below.)  Of course,
you need many unit tests to get the coverage that a single end-to-end
test gives you; then again, unit tests let you test code with much
more precision than end-to-end tests.

I'm not going to argue against having a good end-to-end test suite
around, but it would be quite doable, over the course of a couple of
years, to move to a model where a commit required about 10 minutes of
testing (including all the unit tests and a few smoke end-to-end
tests), and you had separate, automated runs of nightly end-to-end
tests that caught problems that slipped through the unit tests.  (And,
of course, whenever the nightly tests detected problems, you'd update
the unit tests accordingly.)

>> If so, why - do people not write enough tests, is it
>> hard to write good enough tests, something else?  Is the review
>> process inadequate?  If so, why: are rules insufficiently stringent,
>> are reviewers sloppy, are there not enough reviewers, are patches too
>> hard to review?
>> 
>> My guess is that most or all of those are factors, but some are more
>> important than others.

> No. The problems are entirely technical in nature. It's not a pure
> human resources management issue.

I don't think it's a pure human resources issue, but I don't think
it's a purely technical issue, either, if for no other reason than
that people are involved.

>> My favorite tactic to decrease the number of
>> bugs is to set up a unit test framework for your code base (so you can
>> test changes to individual functions without having to run the whole
>> compiler), and to strongly encourage patches to be accompanied by unit
>> tests.

> That's basically a pipe dream with the autoxxxx based build system.

Why?  What's so difficult about building one more (or a few more) unit
test binaries along with the binaries you're building now?

David Carlton
david.carlton@sun.com


Here are numbers to back up my unit test timing claim; these are all
run on a computer that cost lest than a thousand dollars a year ago.

A C++ example, which is probably closest to your situation:

panini$ time ./unittesttest
.....................................................................
Tests finished with 69 passes and 0 failures.

real    0m0.013s
user    0m0.004s
sys     0m0.004s

A Java example:

panini$ time java org.bactrian.dbcdb.AllTests
.........................................
.........................................
.........................................
.........................................
.........
Time: 0.597

OK (173 tests)


real    0m1.109s
user    0m1.064s
sys     0m0.024s

And a Ruby example:

panini$ time ruby -e "require 'dbcdb/test/all'"
Loaded suite -e
Started
...............................................................
Finished in 0.039504 seconds.

63 tests, 110 assertions, 0 failures, 0 errors

real    0m0.150s
user    0m0.128s
sys     0m0.016s

No matter the language, you get between hundreds and thousands of
tests a second; that C++ example works out to over 5000 tests a
second.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]