GCC Buildbot Update - Definition of regression

Wed Oct 11 08:36:00 GMT 2017

On 11 October 2017 at 08:34, Paulo Matos <pmatos@linki.tools> wrote:
>
>
> On 10/10/17 23:25, Joseph Myers wrote:
>> On Tue, 10 Oct 2017, Paulo Matos wrote:
>>
>>>     new test -> FAIL        ; New test starts as fail
>>
>> No, that's not a regression, but you might want to treat it as one (in the
>> sense that it's a regression at the higher level of "testsuite run should
>> have no unexpected failures", even if the test in question would have
>> failed all along if added earlier and so the underlying compiler bug, if
>> any, is not a regression).  It should have human attention to classify it
>> and either fix the test or XFAIL it (with issue filed in Bugzilla if a
>> bug), but it's not a regression.  (Exception: where a test failing results
>> in its name changing, e.g. through adding "(internal compiler error)".)
>>
>
> When someone adds a new test to the testsuite, isn't it supposed to not
> FAIL? If is does FAIL, shouldn't this be considered a regression?
>
> Now, the danger is that since regressions are comparisons with previous
> run something like this would happen:
>
> run1:
> ...
> FAIL: foo.c ; new test
> ...
>
> run1 fails because new test entered as a FAIL
>
> run2:
> ...
> FAIL: foo.c
> ...
>
> run2 succeeds because there are no changes.
>
> For this reason all of this issues need to be taken care straight away
> or they become part of the 'normal' status and no more failures are
> issued... unless of course a more complex regression analysis is
> implemented.
>
Agreed.

> Also, when I mean, run1 fails or succeeds this is just the term I use to
> display red/green in the buildbot interface for a given build, not
> necessarily what I expect the process will do.
>
>>
>> My suggestion is:
>>
>> PASS -> FAIL is an unambiguous regression.
>>
>> Anything else -> FAIL and new FAILing tests aren't regressions at the
>> individual test level, but may be treated as such at the whole testsuite
>> level.
>>
>> Any transition where the destination result is not FAIL is not a
>> regression.
>>

FWIW, we consider regressions:
* any->FAIL because we don't want such a regression at the whole testsuite level
* any->UNRESOLVED for the same reason
* {PASS,UNSUPPORTED,UNTESTED,UNRESOLVED}-> XPASS
* new XPASS
* XFAIL disappears (may mean that a testcase was removed, worth a manual check)
* ERRORS

>> ERRORs in the .sum or .log files should be watched out for as well,
>> however, as sometimes they may indicate broken Tcl syntax in the
>> testsuite, which may cause many tests not to be run.
>>
>> Note that the test names that come after PASS:, FAIL: etc. aren't unique
>> between different .sum files, so you need to associate tests with a tuple
>> (.sum file, test name) (and even then, sometimes multiple tests in a .sum
>> file have the same name, but that's a testsuite bug).  If you're using
>> --target_board options that run tests for more than one multilib in the
>> same testsuite run, add the multilib to that tuple as well.
>>
>
> Thanks for all the comments. Sounds sensible.
> By not being unique, you mean between languages?
Yes, but not only as Joseph mentioned above.

You have the obvious example of c-c++-common/*san tests, which are
common to gcc and g++.

> I assume that two gcc.sum from different builds will always refer to the
> same test/configuration when referring to (for example):
> PASS: gcc.c-torture/compile/20000105-1.c   -O1  (test for excess errors)
>
> In this case, I assume that "gcc.c-torture/compile/20000105-1.c   -O1
> (test for excess errors)" will always be referring to the same thing.
>
In gcc.sum, I can see 4 occurrences of
PASS: gcc.dg/Werror-13.c  (test for errors, line )

Actually, there are quite a few others like that....

Christophe

> --
> Paulo Matos