This is the mail archive of the
gcc@gcc.gnu.org
mailing list for the GCC project.
Re: GCC Buildbot Update - Definition of regression
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Paulo Matos <pmatos@linki.tools>
- Cc: <gcc at gcc dot gnu dot org>
- Date: Wed, 11 Oct 2017 13:19:00 +0000
- Subject: Re: GCC Buildbot Update - Definition of regression
- Authentication-results: sourceware.org; auth=none
- References: <3b204c53-7613-ffe2-172b-69ad2a781ceb@linki.tools> <alpine.DEB.2.20.1710102111080.20946@digraph.polyomino.org.uk> <726e85ce-9ded-e9b4-6412-edb1da76c768@linki.tools>
On Wed, 11 Oct 2017, Paulo Matos wrote:
> On 10/10/17 23:25, Joseph Myers wrote:
> > On Tue, 10 Oct 2017, Paulo Matos wrote:
> >
> >> new test -> FAIL ; New test starts as fail
> >
> > No, that's not a regression, but you might want to treat it as one (in the
> > sense that it's a regression at the higher level of "testsuite run should
> > have no unexpected failures", even if the test in question would have
> > failed all along if added earlier and so the underlying compiler bug, if
> > any, is not a regression). It should have human attention to classify it
> > and either fix the test or XFAIL it (with issue filed in Bugzilla if a
> > bug), but it's not a regression. (Exception: where a test failing results
> > in its name changing, e.g. through adding "(internal compiler error)".)
> >
>
> When someone adds a new test to the testsuite, isn't it supposed to not
> FAIL? If is does FAIL, shouldn't this be considered a regression?
Only a regression at the whole-testsuite level (in that "no FAILs" is the
desired state). Not a regression in the sense of a regression bug in GCC
that might be relevant for release management (something user-visible that
worked in a previous GCC version but no longer works). And if e.g.
someone added a dg-require-effective-target (for example) line to a
testcase, so incrementing all the line numbers in that test, every PASS /
FAIL assertion in that test will have its line number increase by 1, so
being renamed, so resulting in spurious detection of a regression if you
consider new FAILs as regressions (even at the whole-testsuite level, an
increased line number on an existing FAIL is not meaningfully a
regression).
> For this reason all of this issues need to be taken care straight away
Well, I think it *does* make sense to do sufficient analysis on existing
FAILs to decide if they are testsuite issues or compiler bugs, fix if they
are testsuite issues and XFAIL with reference to a bug in Bugzilla if
compiler bugs. That is, try to get to the point where no-FAILs is the
normal expected testsuite state and it's Bugzilla, not
expected-FAILs-not-marked-as-XFAIL, that is used to track regressions and
other bugs.
> By not being unique, you mean between languages?
Yes (e.g. c-c++-common tests in both gcc and g++ tests might have the same
name in both .sum files, but should still be counted as different tests).
> I assume that two gcc.sum from different builds will always refer to the
> same test/configuration when referring to (for example):
> PASS: gcc.c-torture/compile/20000105-1.c -O1 (test for excess errors)
The problem is when e.g. multiple diagnostics are being tested for on the
same line but the "test name" field in the dg-* directive is an empty
string for all of them. One possible approach is to automatically (in
your regression checking scripts) append a serial number to the first,
second, third etc. cases of any given repeated test name in a .sum file.
Or you could count such duplicates as being errors that automatically
result in red test results, and get fixes for them into GCC as soon as
possible.
--
Joseph S. Myers
joseph@codesourcery.com