[PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

Richard Biener rguenther@suse.de
Mon Nov 16 12:45:00 GMT 2015


On Mon, 16 Nov 2015, Tom de Vries wrote:

> On 11/11/15 12:02, Richard Biener wrote:
> > On Mon, 9 Nov 2015, Tom de Vries wrote:
> > 
> > > On 09/11/15 16:35, Tom de Vries wrote:
> > > > Hi,
> > > > 
> > > > this patch series for stage1 trunk adds support to:
> > > > - parallelize oacc kernels regions using parloops, and
> > > > - map the loops onto the oacc gang dimension.
> > > > 
> > > > The patch series contains these patches:
> > > > 
> > > >        1    Insert new exit block only when needed in
> > > >           transform_to_exit_first_loop_alt
> > > >        2    Make create_parallel_loop return void
> > > >        3    Ignore reduction clause on kernels directive
> > > >        4    Implement -foffload-alias
> > > >        5    Add in_oacc_kernels_region in struct loop
> > > >        6    Add pass_oacc_kernels
> > > >        7    Add pass_dominator_oacc_kernels
> > > >        8    Add pass_ch_oacc_kernels
> > > >        9    Add pass_parallelize_loops_oacc_kernels
> > > >       10    Add pass_oacc_kernels pass group in passes.def
> > > >       11    Update testcases after adding kernels pass group
> > > >       12    Handle acc loop directive
> > > >       13    Add c-c++-common/goacc/kernels-*.c
> > > >       14    Add gfortran.dg/goacc/kernels-*.f95
> > > >       15    Add libgomp.oacc-c-c++-common/kernels-*.c
> > > >       16    Add libgomp.oacc-fortran/kernels-*.f95
> > > > 
> > > > The first 9 patches are more or less independent, but patches 10-16 are
> > > > intended to be committed at the same time.
> > > > 
> > > > Bootstrapped and reg-tested on x86_64.
> > > > 
> > > > Build and reg-tested with nvidia accelerator, in combination with a
> > > > patch that enables accelerator testing (which is submitted at
> > > > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > > > 
> > > > I'll post the individual patches in reply to this message.
> > > > 
> > > 
> > > This patch adds the pass_oacc_kernels pass group to the pass list in
> > > passes.def.
> > > 
> > > Note the repetition of pass_lim/pass_copy_prop. The first pair is for an
> > > inner
> > > loop in a loop nest, the second for an outer loop in a loop nest.
> > 
> > @@ -86,6 +86,27 @@ along with GCC; see the file COPYING3.  If not see
> >            /* pass_build_ealias is a dummy pass that ensures that we
> >               execute TODO_rebuild_alias at this point.  */
> >            NEXT_PASS (pass_build_ealias);
> > +         /* Pass group that runs when there are oacc kernels in the
> > +            function.  */
> > +         NEXT_PASS (pass_oacc_kernels);
> > +         PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
> > +             NEXT_PASS (pass_dominator_oacc_kernels);
> > +             NEXT_PASS (pass_ch_oacc_kernels);
> > +             NEXT_PASS (pass_dominator_oacc_kernels);
> > +             NEXT_PASS (pass_tree_loop_init);
> > +             NEXT_PASS (pass_lim);
> > +             NEXT_PASS (pass_copy_prop);
> > +             NEXT_PASS (pass_lim);
> > +             NEXT_PASS (pass_copy_prop);
> > 
> > iterate lim/copyprop twice?!  Why's that needed?
> > 
> 
> I've managed to eliminate the last pass_copy_prop, but not pass_lim. I've
> added a comment:
> ...
>   /* We use pass_lim to rewrite in-memory iteration and reduction
>      variable accesses in loops into local variables accesses.
>      However, a single pass instantion manages to do this only for
>      one loop level, so we use pass_lim twice to at least be able to
>      handle a loop nest with a depth of two.  */
>   NEXT_PASS (pass_lim);
>   NEXT_PASS (pass_copy_prop);
>   NEXT_PASS (pass_lim);
> ...

Huh.  Testcase?  LIM is perfectly able to handle nests.

> > +             NEXT_PASS (pass_scev_cprop);
> > 
> > What's that for?  It's supposed to help removing loops - I don't
> > expect kernels to vanish.
> 
> I'm using pass_scev_cprop for the "final value replacement" functionality.
> Added comment.

That functionality is intented to enable loop removal.

> > 
> > +             NEXT_PASS (pass_tree_loop_done);
> > +             NEXT_PASS (pass_dominator_oacc_kernels);
> > 
> > Three times DOM?  No please.  I wonder why you don't run oacc_kernels
> > after FRE and drop the initial DOM(s).
> > 
> 
> Done. There's just one pass_dominator_oacc_kernels left now.
> 
> > +             NEXT_PASS (pass_dce);
> > +             NEXT_PASS (pass_tree_loop_init);
> > +             NEXT_PASS (pass_parallelize_loops_oacc_kernels);
> > +             NEXT_PASS (pass_expand_omp_ssa);
> > +             NEXT_PASS (pass_tree_loop_done);
> > 
> > The switches into/outof tree_loop also look odd to me, but well
> > (they'll be controlled by -ftree-loop-optimize)).
> > 
> 
> I've eliminated all the uses for pass_tree_loop_init/pass_tree_loop_done in
> the pass group. Instead, I've added conditional loop optimizer setup in:
> -  pass_lim and pass_scev_cprop (added in this patch), and
> - pass_parallelize_loops_oacc_kernels (added in patch "Add
>   pass_parallelize_loops_oacc_kernels").

You miss calling scev_finalize ().

Much better otherwise.  I still wonder about scev_cprop and LIM two
times.

Richard.



More information about the Gcc-patches mailing list