This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing


Hi!

On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
> <thomas@codesourcery.com> wrote:
> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell <nathan@acm.org> wrote:
> >> > On 10/14/16 05:28, Richard Biener wrote:
> >> >
> >> >> The BB_VISITED flag has indetermined state at the beginning of a pass.
> >> >> You have to ensure it is cleared yourself.
> >> >
> >> >
> >> > In that case the openacc (&nvptx?) passes should be modified to clear the
> >> > flags at their start, rather than at their end.
> >
> > The gcc/config/nvptx/nvptx.c handling seems fine -- it explicitly clears
> > BB_VISITED for all basic block it works on.
> >
> >> Yes.  But as I said, I ran into IRA ICEs (somewhere in the testsuite) when not
> >> cleaning up after tree-ssa-propagate.c.  So somebody has to fix IRA first.
> >
> > Is there a GCC PR for that, or where are you tracking such issues?
> 
> No, just tracking in my head.

Tsk, tsk...  ;-)


> > OK to commit the following?  Is such a test case appropriate (which would
> > have caught this issue right away), in particular the dg-final
> > scan-tree-dump line?
> 
> Ugh.  Not worse to what we do in various dwarf scanning I guess.

;-|

> Doesn't failure lead to a miscompile eventually?  So you could formulate
> this as a dg-do run test with a check for the desired outcome?

No, unfortunately.  In this case the error is "benign" such that the
OpenACC loop processing machinery will decide to not parallelize loops
that ought to be parallelized.  This won't generally cause any problem
(apart from performance regression, obviously); it just caused problems
in a few libgomp test cases that actually at run time test for
parallelized execution -- which will/did trigger only with nvptx
offloading enabled, which not too many people are testing.  The test case
I propose below will trigger also for non-offloading configurations.

> >     Clear basic block flags before using BB_VISITED for OpenACC loops processing
> >
> >         gcc/
> >         * omp-low.c (oacc_loop_discovery): Call clear_bb_flags.
> >
> >         gcc/testsuite/
> >         * gcc.dg/goacc/loop-processing-1.c: New file.
> > ---
> >  gcc/omp-low.c                                  |  9 +++++----
> >  gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++++++++++++++++++
> >  2 files changed, 23 insertions(+), 4 deletions(-)
> >
> > --- gcc/omp-low.c
> > +++ gcc/omp-low.c
> > @@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
> >  static oacc_loop *
> >  oacc_loop_discovery ()
> >  {
> > -  basic_block bb;
> > +  /* Clear basic block flags, in particular BB_VISITED which we're going to use
> > +     in the following.  */
> > +  clear_bb_flags ();
> >
> >    oacc_loop *top = new_oacc_loop_outer (current_function_decl);
> >    oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > @@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
> >       that diagnostics come out in an unsurprising order.  */
> >    top = oacc_loop_sibling_nreverse (top);
> >
> > -  /* Reset the visited flags.  */
> > -  FOR_ALL_BB_FN (bb, cfun)
> > -    bb->flags &= ~BB_VISITED;
> > +  /* Clear basic block flags again, as otherwise IRA will explode later on.  */
> > +  clear_bb_flags ();
> >
> >    return top;
> >  }
> > --- /dev/null
> > +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
> > @@ -0,0 +1,18 @@
> > +/* Make sure that OpenACC loop processing happens.  */
> > +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
> > +
> > +extern int place ();
> > +
> > +void vector_1 (int *ary, int size)
> > +{
> > +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
> > +  {
> > +#pragma acc loop gang
> > +    for (int jx = 0; jx < 1; jx++)
> > +#pragma acc loop auto
> > +      for (int ix = 0; ix < size; ix++)
> > +       ary[ix] = place ();
> > +  }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "OpenACC loops.*Loop 0\\\(0\\\).*Loop 14\\\(1\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 20\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 20\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Loop 6\\\(4\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 6\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 6\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, \\\.data_dep\\\.\[0-9_\]+, 2\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, \\\.data_dep\\\.\[0-9_\]+, 2\\\);" "oaccdevlow" } } */


Grüße
 Thomas


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]