RFC: Monitoring old PRs, new dg directives

Tue Aug 4 21:34:21 GMT 2020

Hi Mike,

thanks for your comments.

On Tue, Jul 28, 2020 at 06:37:26PM -0700, Mike Stump via Gcc-patches wrote:
> I'll punt to the the C++ front-end folks to chime in.  Usually we only check in bugs that are fixed, as they are fixed, this is what makes it a regression suite.  Doing this does have advantages, like, the testsuite is small and doesn't have duplicates and doesn't test anything that is known to fail that isn't an actual regression.

We also add sanity tests for new language features, for example.  I also like
adding (small) tests for DRs that happen to be already fixed to insure that the
compiler continues to behave as expected.  Avoiding duplicates is a good thing,
obviously, but, with this new scheme, if you fix something and while testing
the fix you find that we already have a test, you can choose not to introduce
a new test.  Whether or not a compiler bug is a regression shouldn't, IMHO, be
a criterion for (not) including the test.

tree(1) in testsuite/g++.dg says 70 directories, 13406 files.  If I go wild and
add 200 new C++ tests, that's ~1.5% increase in the number of tests.  That seems
reasonable.  If it still causes grief for some people, and we go with the idea
of using an unfixed/ directory, we could add GCC_TESTSUITE_TEST_UNFIXED envvar
to enable or disable running such tests.

> Ideally, it would be nice to have a way to test bugs out of bugzilla, and report on those fixed bugs as they are fixed.  If people want to keep the test suite a regression suite, then my counter proposal would be to have a branch with the non-regression bugs on it and then people can checkout and test that branch.  Most of the people don't (saving the testing time, which is handy), and then more sporadically, the old bugs branch can be test and BZ state can be moved along as bugs as fixed.  A run once a week or even once a month would seem to be plenty often.

I'm afraid this would defeat the point of this proposal.  If the additional
tests are only available on a branch, no one is going to use it.  Quite
frankly, I'd just stick with my personal repo (where I can do whatever I want,
include test with #includes, unreduced tests, ...) rather than to bother with
rebasing a branch etc.  Even if someone would actually use such a branch from
time to time, we'd lose the benefit of "left shifting" bugs in the developer
workflow -- you would only notice that your patch had changed something days
or weeks after the commit at which point you likely have lost state on it.

> You in general don't need to check in fixes from bug reports that have been fixed in the past, as those fixes generally already have a test case for the fix that went in with the fix.

I agree, but experience shows that's not always the case.  Just the few PRs
I mentioned in my original mail prove that.  There are patches that change
something as a side-effect only, and, as a developer, you want to be aware
of those side-effects.

> As for how old is old, we'd leave that to the contributor of work to decide.  In theory, bugs can be added as soon as they come in, no need to wait.  To the extent that waiting saves work, well, that's a personal choice, for the person doing the work to choose.

I agree.  I'd leave it up to developers.  Just... be reasonable.  No need to
add every broken nonsense test lurking in Bugzilla.

> Why not just use xfail and xpass?  Seems less work than doing a setup_xfail.  Also, why not just use the existing directives instead of adding new directives?  I'm suspicious of expect_ice and accepts_invalid.  You set them to 1 all the time, but almost never set them to 0?  I'm wondering if it should be more like shouldfail?

Good point, I missed that I could use xfail and xpass.  Fixed.

For accepts_invalid I think we *could* use the existing directives, if you know
where to expect an error.  But for ICEs we currently have nothing that would
work well.  I'll probably drop dg-accepts-invalid completely.

I followed shouldfail's suit when it comes to setting them to 1, so that should
be fine.

Thanks,
Marek

> On Jul 28, 2020, at 2:44 PM, Marek Polacek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > 
> > In Bugzilla, for the c++ component, we currently have over 3200 open bugs.  In
> > my experience, a good amount of them have already been fixed; my periodical
> > sweeps always turn up a bunch of PRs that had already been fixed previously.
> > Sometimes my sweeps are more or less random, but more often than not I'm just
> > looking for duplicates of an existing PR.  Sometimes the reason the already
> > fixed PRs are still open is because a PR that was fixed had duplicates that we
> > didn't catch earlier when confirming the PR.  Sometimes a PR gets fixed as a
> > side-effect of fixing another PR.  Manual sweeps are tedious and time-consuming
> > because often you need to grab the test from the Bugzilla yet again (and
> > sometimes there are multiple tests).  Even if you find a PR that was fixed, you
> > still need to bisect the fix and perhaps add the test to our testsuite.  That's
> > draining and since the number of bugs only increases, never decreases, it is not
> > sustainable.
> > 
> > So I've started a personal repo where I've gathered dozens of tests and wrote a
> > script that just compiles every test in the repo and reports if anything
> > changed.  One line from it:
> > 
> > pr=59798; $cxx $o -c $pr.C 2>&1 | grep -qE 'internal compiler error' || echo -e "$pr: ${msg_ice}"
> > 
> > This has major drawbacks: you have to remember to run this manually, keep
> > updating it, and it's yet another repo that people interested in this would
> > have to clone, but the worst thing is that typically you would only discover
> > that a patch fixed a different PR long after the patch was committed.  And
> > quite likely it wasn't even your patch.  We know that finding problems earlier
> > in the developer workflow reduces costs; if we can catch this before the
> > original developer commits & pushes the changes, it's cheaper, because the
> > developer already understands what the patch does.
> > 
> > A case in point: https://gcc.gnu.org/PR58156 which has been fixed recently
> > by an unrelated (?) patch.  Knowing that the tsubst_pack_expansion hunk in
> > the patch had this effect would probably have been very useful.  More testing
> > will lead to a better compiler.
> > 
> > Another case: https://gcc.gnu.org/35098 which was fixed 12 years (!) after
> > it was reported by a different change.
> > 
> > Or another: https://gcc.gnu.org/91525 where the patch contained a test, but
> > that was ice-on-invalid, whereas the test in PR91525 was ice-on-valid.
> > 
> > To alleviate some of these problems, I propose that we introduce a means to our
> > DejaGNU infrastructure that allows adding tests for old bugs that have not been
> > fixed yet, and re-introduce the keyword monitored (no longer used for anything
> > -- I think Volker stepped away) to the GCC Bugzilla to signal that a PR is
> > tracked in the testsuite.  I don't want any unnecessary moving tests around, so
> > the tests would go where they would normally go; they have to be reduced and
> > have proper targets, etc.  Having such tests in the testsuite means that when
> > something changes, you will know immediately, before you push any changes.
> > 
> > My thinking is that for:
> > 
> > * rejects-valid: use the existing dg-xfail-if
> > * accepts-valid: use the new dg-accepts-invalid
> > * ICEs: use the new dg-ice
> > 
> > dg-ice can be used like this:
> > 
> > // { dg-ice "build_over_call" { target c++11 } }
> > 
> > and it means that if the test still ICEs, you'll get a quiet XFAIL.  If the
> > ICE is fixed, you'll get an XPASS; if the ICE is gone but there are errors,
> > you'll get an XPASS + FAIL.  Then you can close the old PR.
> > 
> > Similarly, dg-accepts-invalid:
> > 
> > // { dg-accepts-invalid "PR86500" }
> > 
> > means that if the test still compiles without errors, you'll get a quiet XFAIL.
> > If we start giving errors, you'll get an XPASS.
> > 
> > If the bug is fixed, simply remove the directive.
> > 
> > The patch implementing these new directives is appended.  Once/if this is
> > accepted, I can start adding the old tests we have in our Bugzilla.  (I'm
> > only concerned about the c++ component, if that wasn't already clear.)
> > 
> > The question is what makes the bug "old": is it one year without it being
> > assigned?  6 months?  3 months?  Note: I *don't* propose to add every test for
> > every new PR, just the reasonably old ones that are useful/important.  Such
> > additions should be done in batches, so that we don't have dozens of commits,
> > each of them merely adding a single test.
> > 
> > We will still have a surfeit of bugs that we've given short shrift to, but
> > let's at least automate what we can.  The initial addition of the relevant
> > old(-ish) tests won't of course happen automagically, but it's a price I'm
> > willing to pay.  My goal here isn't merely to reduce the number of open PRs;
> > it is to improve the testing of the compiler overall.
> > 
> > Thoughts?
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu.
> > 
> > [PATCH] testsuite: Introduce dg-ice and dg-accepts-invalid.
> > 
> > gcc/ChangeLog:
> > 
> > 	* doc/sourcebuild.texi: Document dg-ice and dg-accepts-invalid.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* lib/gcc-dg.exp (gcc-dg-test-1): Handle dg-ice and dg-accepts-invalid.
> > 	* lib/prune.exp (prune_ices): New.
> > 	* lib/target-supports-dg.exp (dg-accepts-invalid, dg-ice): New.
> > ---
> > gcc/doc/sourcebuild.texi                 | 19 +++++++
> > gcc/testsuite/lib/gcc-dg.exp             | 39 +++++++++++++-
> > gcc/testsuite/lib/prune.exp              |  9 ++++
> > gcc/testsuite/lib/target-supports-dg.exp | 69 ++++++++++++++++++++++++
> > 4 files changed, 134 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> > index a7a922d84a2..636d21d30dd 100644
> > --- a/gcc/doc/sourcebuild.texi
> > +++ b/gcc/doc/sourcebuild.texi
> > @@ -1172,6 +1172,16 @@ Expect the execute step of a test to fail if the conditions (which are
> > the same as for @code{dg-skip-if}) are met.
> > @end table
> > 
> > +@subsubsection Expect the compiler to crash
> > +
> > +@table @code
> > +@item  @{ dg-ice @var{comment} [@{ @var{selector} @} [@{ @var{include-opts} @} [@{ @var{exclude-opts} @}]]] @}
> > +Expect the compiler to crash with an internal compiler error and return
> > +a nonzero exit status if the conditions (which are the same as for
> > +@code{dg-skip-if}) are met.  Used for tests that test bugs that have not been
> > +fixed yet.
> > +@end table
> > +
> > @subsubsection Expect the test executable to fail
> > 
> > @table @code
> > @@ -1234,6 +1244,15 @@ has the same effect as @samp{target}.
> > 
> > @item @{ dg-prune-output @var{regexp} @}
> > Prune messages matching @var{regexp} from the test output.
> > +
> > +@table @code
> > +@item  @{ dg-accepts-invalid @var{comment} [@{ @var{selector} @} [@{ @var{include-opts} @} [@{ @var{exclude-opts} @}]]] @}
> > +Expect the compiler to accept the test (even though it should be rejected with
> > +a compile-time error), if the conditions (which are the same as for
> > +@code{dg-skip-if}) are met.  Used for tests that test bugs that have not been
> > +fixed yet.
> > +@end table
> > +
> > @end table
> > 
> > @subsubsection Verify output of the test executable
> > diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
> > index 45d97024883..6478eda283b 100644
> > --- a/gcc/testsuite/lib/gcc-dg.exp
> > +++ b/gcc/testsuite/lib/gcc-dg.exp
> > @@ -308,13 +308,48 @@ proc gcc-dg-test-1 { target_compile prog do_what extra_tool_flags } {
> >     verbose "$target_compile $prog $output_file $compile_type $options" 4
> >     set comp_output [$target_compile "$prog" "$output_file" "$compile_type" $options]
> > 
> > +    global expect_ice
> >     # Look for an internal compiler error, which sometimes masks the fact
> >     # that we didn't get an expected error message.  XFAIL an ICE via
> >     # dg-xfail-if and use { dg-prune-output ".*internal compiler error.*" }
> >     # to avoid a second failure for excess errors.
> > -    if [string match "*internal compiler error*" $comp_output] {
> > +    # "Error reporting routines re-entered" ICE says "Internal" rather than
> > +    # "internal", so match that too.
> > +    if [string match {*[Ii]nternal compiler error*} $comp_output] {
> > 	upvar 2 name name
> > -	fail "$name (internal compiler error)"
> > +	if { $expect_ice == 0 } {
> > +	  fail "$name (internal compiler error)"
> > +	} else {
> > +	  # We expected an ICE and we got it.  Emit an XFAIL.
> > +	  setup_xfail "*-*-*"
> > +	  fail "$name (internal compiler error)"
> > +	  clear_xfail "*-*-*"
> > +	  # Prune the ICE from the output.
> > +	  set comp_output [prune_ices $comp_output]
> > +	}
> > +    } else {
> > +	upvar 2 name name
> > +	global accepts_invalid
> > +	if { $expect_ice == 1 } {
> > +	  # We expected an ICE but we didn't get it.  We want an XPASS, so
> > +	  # call setup_xfail to set xfail_flag.
> > +	  setup_xfail "*-*-*"
> > +	  pass "$name (internal compiler error)"
> > +	  clear_xfail "*-*-*"
> > +	} elseif { $accepts_invalid == 1 } {
> > +	    if [string match {*error: *} $comp_output] {
> > +	      # We expected that this test be (wrongly) accepted, but now we have
> > +	      # seen error(s).  Issue an XPASS to signal that.
> > +	      setup_xfail "*-*-*"
> > +	      pass "$name (accepts invalid)"
> > +	      clear_xfail "*-*-*"
> > +	    } else {
> > +	      # This test is still (wrongly) accepted.  Just emit an XFAIL.
> > +	      setup_xfail "*-*-*"
> > +	      fail "$name (accepts invalid)"
> > +	      clear_xfail "*-*-*"
> > +	    }
> > +	}
> >     }
> > 
> >     if { $do_what == "repo" } {
> > diff --git a/gcc/testsuite/lib/prune.exp b/gcc/testsuite/lib/prune.exp
> > index 1c776249f1a..58a739684a5 100644
> > --- a/gcc/testsuite/lib/prune.exp
> > +++ b/gcc/testsuite/lib/prune.exp
> > @@ -118,6 +118,15 @@ proc prune_file_path { text } {
> >     return $text
> > }
> > 
> > +# Prune internal compiler error messages, including the "Please submit..."
> > +# footnote.
> > +
> > +proc prune_ices { text } {
> > +  regsub -all "(^|\n)\[^\n\]*: internal compiler error:.*for instructions\[^\n\]*" $text "" text
> > +  regsub -all "(^|\n|')*Internal compiler error:.*for instructions\[^\n\]*" $text "" text
> > +  return $text
> > +}
> > +
> > # Provide a definition of this if missing (delete after next dejagnu release).
> > 
> > if { [info procs prune_warnings] == "" } then {
> > diff --git a/gcc/testsuite/lib/target-supports-dg.exp b/gcc/testsuite/lib/target-supports-dg.exp
> > index 2a21424b890..765f3a2e27a 100644
> > --- a/gcc/testsuite/lib/target-supports-dg.exp
> > +++ b/gcc/testsuite/lib/target-supports-dg.exp
> > @@ -495,6 +495,75 @@ proc dg-shouldfail { args } {
> >     }
> > }
> > 
> > +# Record whether the compiler is expected (at the moment) to ICE.
> > +# Used for tests that test bugs that have not been fixed yet.
> > +
> > +set expect_ice 0
> > +set accepts_invalid 0
> > +
> > +proc dg-ice { args } {
> > +    # Don't bother if we're already skipping the test.
> > +    upvar dg-do-what dg-do-what
> > +    if { [lindex ${dg-do-what} 1] == "N" } {
> > +      return
> > +    }
> > +
> > +    global accepts_invalid
> > +    # Can't be combined with dg-accepts-invalid.
> > +    if { $accepts_invalid == 1 } {
> > +      error "dg-ice: cannot be combined with dg-accepts-invalid"
> > +      return
> > +    }
> > +
> > +    global expect_ice
> > +
> > +    set args [lreplace $args 0 0]
> > +    if { [llength $args] > 1 } {
> > +	set selector [list target [lindex $args 1]]
> > +	if { [dg-process-target-1 $selector] == "S" } {
> > +	    # The target matches, now check the flags.
> > +	    if [check-flags $args] {
> > +		set expect_ice 1
> > +	    }
> > +	}
> > +    } else {
> > +	set expect_ice 1
> > +    }
> > +}
> > +
> > +# Record whether the compiler should reject the testcase with an error,
> > +# but currently doesn't do so.  Used for accepts-invalid bugs.
> > +
> > +proc dg-accepts-invalid { args } {
> > +    # Don't bother if we're already skipping the test.
> > +    upvar dg-do-what dg-do-what
> > +    if { [lindex ${dg-do-what} 1] == "N" } {
> > +      return
> > +    }
> > +
> > +    global expect_ice
> > +    # Can't be combined with dg-ice.
> > +    if { $expect_ice == 1 } {
> > +      error "dg-accepts-invalid: cannot be combined with dg-ice"
> > +      return
> > +    }
> > +
> > +    global accepts_invalid
> > +
> > +    set args [lreplace $args 0 0]
> > +    if { [llength $args] > 1 } {
> > +	set selector [list target [lindex $args 1]]
> > +	if { [dg-process-target-1 $selector] == "S" } {
> > +	    # The target matches, now check the flags.
> > +	    if [check-flags $args] {
> > +		set accepts_invalid 1
> > +	    }
> > +	}
> > +    } else {
> > +	set accepts_invalid 1
> > +    }
> > +}
> > +
> > # Intercept the call to the DejaGnu version of dg-process-target to
> > # support use of an effective-target keyword in place of a list of
> > # target triplets to xfail or skip a test.
> > 
> > base-commit: f3665bd1111c1799c0421490b5e655f977570354
> > -- 
> > 2.26.2
> >