[PATCH, 6/16] Add pass_oacc_kernels

Richard Biener rguenther@suse.de
Wed Nov 11 10:59:00 GMT 2015


On Mon, 9 Nov 2015, Tom de Vries wrote:

> On 09/11/15 16:35, Tom de Vries wrote:
> > Hi,
> > 
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.
> > 
> > The patch series contains these patches:
> > 
> >       1    Insert new exit block only when needed in
> >          transform_to_exit_first_loop_alt
> >       2    Make create_parallel_loop return void
> >       3    Ignore reduction clause on kernels directive
> >       4    Implement -foffload-alias
> >       5    Add in_oacc_kernels_region in struct loop
> >       6    Add pass_oacc_kernels
> >       7    Add pass_dominator_oacc_kernels
> >       8    Add pass_ch_oacc_kernels
> >       9    Add pass_parallelize_loops_oacc_kernels
> >      10    Add pass_oacc_kernels pass group in passes.def
> >      11    Update testcases after adding kernels pass group
> >      12    Handle acc loop directive
> >      13    Add c-c++-common/goacc/kernels-*.c
> >      14    Add gfortran.dg/goacc/kernels-*.f95
> >      15    Add libgomp.oacc-c-c++-common/kernels-*.c
> >      16    Add libgomp.oacc-fortran/kernels-*.f95
> > 
> > The first 9 patches are more or less independent, but patches 10-16 are
> > intended to be committed at the same time.
> > 
> > Bootstrapped and reg-tested on x86_64.
> > 
> > Build and reg-tested with nvidia accelerator, in combination with a
> > patch that enables accelerator testing (which is submitted at
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
> > 
> > I'll post the individual patches in reply to this message.
> 
> this patchs add a pass group pass_oacc_kernels (which will be added to the
> pass list as a whole in patch 10).

Just to understand (while also skimming the HSA patches).

You are basically relying on autopar for what the HSA patches call
"gridification"?  That is, OMP lowering produces loopy kernels
and autopar then will basically strip the outermost loop?

Richard.

> Atm, the parallelization behaviour for the kernels region is controlled by
> flag_tree_parallelize_loops, which is also used to control generic
> auto-parallelization by autopar using omp. That is not ideal, and we may want
> a separate flag (or param) to control the behaviour for oacc kernels, f.i.
> -foacc-kernels-gang-parallelize=<n>. I'm open to suggestions.
> 
> The purpose of the pass group as a whole is to massage the offloaded function
> into a shape that parloops can deal with it, and then run parloops on it.
> 
> Consider a testcase with a reduction, and a loop counter declared outside the
> offload region:
> ...
> unsigned int a[n];
> 
> unsigned int
> foo (void)
> {
>   int i;
>   unsigned int sum = 1;
> 
> #pragma acc kernels copyin (a[0:n]) copy (sum)
>   {
>     for (i = 0; i < n; ++i)
>       sum += a[i];
>   }
> 
>   return sum;
> }
> ...
> 
> After ealias, the loop body looks like this:
> ...
>   <bb 5>:
>   _8 = *.omp_data_i_3(D).a;
>   _9 = *.omp_data_i_3(D).i;
>   _10 = *_9;
>   _11 = *_8[_10];
>   _12 = *.omp_data_i_3(D).sum;
>   sum.0_13 = *_12;
>   sum.1_14 = _11 + sum.0_13;
>   _15 = *.omp_data_i_3(D).sum;
>   *_15 = sum.1_14;
>   _17 = *.omp_data_i_3(D).i;
>   _18 = *_17;
>   _19 = *.omp_data_i_3(D).i;
>   _20 = _18 + 1;
>   *_19 = _20;
>   goto <bb 6>;
> ...
> In other words, the iteration variable is in memory, as is the reduction
> variable, and the body contains lots of loop invariant loads.
> 
> At the end of the pass group, just before parloops, the body has been
> rewritten to have a local iteration variable and a local reduction variable,
> and all the loop invariant loads have been moved out of the loop:
> ...
>   <bb 4>:
>   # _27 = PHI <0(2), _20(5)>
>   # D__lsm.7_28 = PHI <D__lsm.7_29(2), sum.1_14(5)>
>   _11 = *_8[_27];
>   sum.1_14 = _11 + D__lsm.7_28;
>   _20 = _27 + 1;
>   if (_20 <= 9999)
>     goto <bb 5>;
>   else
>     goto <bb 3>;
> ...
> 
> Thanks,
> - Tom
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)



More information about the Gcc-patches mailing list