This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance regression testing?


On Mon, 28 Nov 2005, Joe Buck wrote:

On Mon, 28 Nov 2005, Mark Mitchell wrote:

We're collectively putting a lot of energy into performance improvements in GCC. Sometimes, a performance gain from one patch gets undone by another patch -- which is itself often doing something else beneficial. People have mentioned to me that we require people to run regression tests for correctness, but that we don't really have anything equivalent for performance.

It would be possible to detect performance regression after fact, but
soon enough to look at reverting patches. For example, given multiple
machines doing SPEC benchmark runs every night, the alarm could be raised
if a significant performance regression is detected. To guard against
noise from machine hiccups, two different machines would have to report
a regression to raise the alarm. But the big problem is the non-freeness
of SPEC; ideally there would be a benchmark that ...


... everyone can download and run
... is reasonably fast
... is non-trivial

Yes! This would be very useful for other free software projects.


Another possible requirement is that the tests are not too large; it would be nice to include them in the source code of one's project for easier integration.

As a strawman, perhaps we could add a small integer program (bzip?) and
a small floating-point program to the testsuite, and have DejaGNU print
out the number of iterations of each that run in 10 seconds.

Would that really catch much?

I've been thinking about this kind of thing recently for Valgrind. I was thinking that a combination of real programs and artificial microbenchmarks would be good. The microbenchmarks would be like the GCC (correctness) torture tests -- a collection of programs, added to over time, each one demonstrating a prior performance bug. You could start it off with a few tests containing things like key inner loops extracted from programs such as bzip2.


Measuring the programs and categorizing regressions is tricky. It's possible that the artificial tests would be small enough that any regression would be obvious (eg. failing to remove that extra instruction would cause a 10% slowdown). And CSiBE-style graphing is very effective for seeing trends.

Nick


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]