This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance regression testing?


On Mon, 28 Nov 2005, Mark Mitchell wrote:
> As a strawman, perhaps we could add a small integer program (bzip?) and
> a small floating-point program to the testsuite, and have DejaGNU print
> out the number of iterations of each that run in 10 seconds.

Please make it the other way round (time for a fix number of
iterations, perhaps with the number being settable); it's
generally easier to adopt that to testing with simulators.

Simulator testing seems generally regarded as a poor cousin
here, but has some disctinct advantages: the results carry over
between different test-beds/hosts and are not subject to noise
from e.g. system load as long as the number of cycles is the
unit of measure of run time.  On the other hand, digging out the
number of cycles is done slightly different depending on target.
I hope to provide tools for that.

> Again, that's a strawman.  I'm just looking for suggestions about what
> we might to do -- or even feedback that there's no need to do anything.

I'm working on a csibe.exp for use with CSiBE-2.1.1, focusing on
simulator testing.  Native testing works too of course, but I
haven't really solved the problems with system noise and (still)
getting a usable time-scale.

I admit csibe isn't aimed at being an execution time performance
regression tool: I chose csibe rather than something homegrown
mainly so I'd not have to invest time in a discussion regarding
the choice of benchmarks and test input.  (FWIW, I do have a
number of homegrown tests too, but none that work within the gcc
testing framework.)  Anyway, the testing framework isn't
supposed to be tied to CSiBE (lots can and should be extracted
as generic tools), it just seemed sane enough to start with.

I've attached the work-in-progress so I don't have to get into
detail about what it does :-) except noting that you'll see in
gcc.sum something like:

PASS: csibe -O1 runtime zlib-1.1.4:minigzip not slower than best
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not more than .1% slower than best
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not more than 1% slower than best
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not more than 10% slower than best
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not slower than milestone
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not more than .1% slower than milestone
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not more than 1% slower than milestone
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not more than 10% slower than milestone
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not slower than previous
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not more than .1% slower than previous
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not more than 1% slower than previous
PASS: csibe -O1 runtime zlib-1.1.4:minigzip not more than 10% slower than previous
(repeated for each different test and gcc options in the chosen
set.)

Roughly, the tester person decides (or relies on defaults) on a
number of baselines like the arbitrary set shown above: "best",
"milestone" and "previous" to which the runtime (seen above),
compile-time and size of the test-programs is compared to some
set of criteria iterating on a set of compiler options not
unlike the torture iterations.  Updated baseline data is also
output by the tests, to simplify feedback (just the "previous";
"best" not currently implemented).

Before you guys hose it completely, let me repeat: this is work
in progress.  I'm not sure what's useful yet; perhaps just one
baseline should be default.  Perhaps some of the test results
should be accumulated, to avoid 43445 different sub-tests.

Note that csibe doesn't have integrity checks for its (few)
runtime tests; patch for that is attached.  (No, I haven't
contacted the csibe people yet.)  One of the tests has lots of
off-by-one bugs causing SEGV on cris-axis-linux-gnu (the
equivalent within the simulator); patch attached for that too.
Another of the programs, "flex", is definitely simulator-
unfriendly: it relies heavily on fork and executing sub-programs
for its final output.  Most of the other programs could do with
some editing to avoid constructs rarely present in simulators,
and perhaps some pruning to cut down the time from ~2h per
iteration to a few minutes (in total, 1h for cris-axis-linux-gnu
+ sim/cris).

To wit: I agree we need some performance tests other than SPEC
and I think *something* like the above should be done, and
optionally run as part of the usual testsuite.

brgds, H-P

Attachment: csibe.exp
Description: gcc/testsuite/gcc.performance/csibe.exp

Attachment: csibe112-test-patch4
Description: CSiBE integrity checks for runtime tests

Attachment: csibe112-test-patch4-2
Description: Bugfix for CSiBE jikespg


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]