GCC Buildbot Update - Definition of regression

Tue Oct 10 21:25:00 GMT 2017

On Tue, 10 Oct 2017, Paulo Matos wrote:

>     ANY -> no test  ; Test disappears

No, that's not a regression.  Simply adding a line to a testcase will 
change the line number that appears in the PASS / FAIL line for an 
individual assertion therein.  Or the names will change when e.g. 
-std=c++2a becomes -std=c++20 and all the tests with a C++ standard 
version in them change their names.  Or if a bogus test is removed.

>     ANY / XPASS -> XPASS    ; Test goes from any status other than XPASS
> to XPASS
>     ANY / KPASS -> KPASS    ; Test goes from any status other than KPASS
> to KPASS

No, that's not a regression.  It's inevitable that XFAILing conditions may 
sometimes be broader than ideal, if it's not possible to describe the 
exact failure conditions to the testsuite, and so sometimes a test may 
reasonably XPASS.  Such tests *may* sometimes be candidates for a more 
precise XFAIL condition, but they aren't regressions.

>     new test -> FAIL        ; New test starts as fail

No, that's not a regression, but you might want to treat it as one (in the 
sense that it's a regression at the higher level of "testsuite run should 
have no unexpected failures", even if the test in question would have 
failed all along if added earlier and so the underlying compiler bug, if 
any, is not a regression).  It should have human attention to classify it 
and either fix the test or XFAIL it (with issue filed in Bugzilla if a 
bug), but it's not a regression.  (Exception: where a test failing results 
in its name changing, e.g. through adding "(internal compiler error)".)

>     PASS -> ANY             ; Test moves away from PASS

No, only a regression if the destination result is FAIL (if it's 
UNRESOLVED then there might be a separate regression - execution test 
becoming UNRESOLVED should be accompanied by compilation becoming FAIL).  
If it's XFAIL, it might formally be a regression, but one already being 
tracked in another way (presumably Bugzilla) which should not turn the bot 
red.  If it's XPASS, that simply means XFAILing conditions slightly wider 
than necessary in order to mark failure in another configuration as 
expected.

My suggestion is:

PASS -> FAIL is an unambiguous regression.

Anything else -> FAIL and new FAILing tests aren't regressions at the 
individual test level, but may be treated as such at the whole testsuite 
level.

Any transition where the destination result is not FAIL is not a 
regression.

ERRORs in the .sum or .log files should be watched out for as well, 
however, as sometimes they may indicate broken Tcl syntax in the 
testsuite, which may cause many tests not to be run.

Note that the test names that come after PASS:, FAIL: etc. aren't unique 
between different .sum files, so you need to associate tests with a tuple 
(.sum file, test name) (and even then, sometimes multiple tests in a .sum 
file have the same name, but that's a testsuite bug).  If you're using 
--target_board options that run tests for more than one multilib in the 
same testsuite run, add the multilib to that tuple as well.

-- 
Joseph S. Myers
joseph@codesourcery.com