[committed] Add pass_parallelize_loops to pass_oacc_kernels

Tom de Vries Tom_deVries@mentor.com
Mon Jan 18 13:30:00 GMT 2016


[ was: Re: [committed] Add oacc_kernels_p argument to 
pass_parallelize_loops ]

On 18/01/16 14:07, Tom de Vries wrote:
> [was: Re: [PIING][PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels ]
>
> On 14/12/15 16:22, Richard Biener wrote:
>> On Sun, Dec 13, 2015 at 5:58 PM, Tom de Vries <Tom_deVries@mentor.com>
>> wrote:
>>> On 24/11/15 13:24, Tom de Vries wrote:
>>>>
>>>> On 16/11/15 12:59, Tom de Vries wrote:
>>>>>
>>>>> On 09/11/15 20:52, Tom de Vries wrote:
>>>>>>
>>>>>> On 09/11/15 16:35, Tom de Vries wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> this patch series for stage1 trunk adds support to:
>>>>>>> - parallelize oacc kernels regions using parloops, and
>>>>>>> - map the loops onto the oacc gang dimension.
>>>>>>>
>>>>>>> The patch series contains these patches:
>>>>>>>
>>>>>>>        1    Insert new exit block only when needed in
>>>>>>>           transform_to_exit_first_loop_alt
>>>>>>>        2    Make create_parallel_loop return void
>>>>>>>        3    Ignore reduction clause on kernels directive
>>>>>>>        4    Implement -foffload-alias
>>>>>>>        5    Add in_oacc_kernels_region in struct loop
>>>>>>>        6    Add pass_oacc_kernels
>>>>>>>        7    Add pass_dominator_oacc_kernels
>>>>>>>        8    Add pass_ch_oacc_kernels
>>>>>>>        9    Add pass_parallelize_loops_oacc_kernels
>>>>>>>       10    Add pass_oacc_kernels pass group in passes.def
>>>>>>>       11    Update testcases after adding kernels pass group
>>>>>>>       12    Handle acc loop directive
>>>>>>>       13    Add c-c++-common/goacc/kernels-*.c
>>>>>>>       14    Add gfortran.dg/goacc/kernels-*.f95
>>>>>>>       15    Add libgomp.oacc-c-c++-common/kernels-*.c
>>>>>>>       16    Add libgomp.oacc-fortran/kernels-*.f95
>>>>>>>
>>>>>>> The first 9 patches are more or less independent, but patches
>>>>>>> 10-16 are
>>>>>>> intended to be committed at the same time.
>>>>>>>
>>>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>>>
>>>>>>> Build and reg-tested with nvidia accelerator, in combination with a
>>>>>>> patch that enables accelerator testing (which is submitted at
>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
>>>>>>>
>>>>>>> I'll post the individual patches in reply to this message.
>>>>>>
>>>>>>
>>>>>> This patch adds pass_parallelize_loops_oacc_kernels.
>>>>>>
>>>>>> There's a number of things we do differently in parloops for oacc
>>>>>> kernels:
>>>>>> - in normal parloops, we generate code to choose between a parallel
>>>>>>     version of the loop, and a sequential (low iteration count)
>>>>>> version.
>>>>>>     Since the code in oacc kernels region is supposed to run on the
>>>>>>     accelerator anyway, we skip this check, and don't add a low
>>>>>> iteration
>>>>>>     count loop.
>>>>>> - in normal parloops, we generate an #pragma omp parallel /
>>>>>>     GIMPLE_OMP_RETURN pair to delimit the region which will we
>>>>>> split off
>>>>>>     into a thread function. Since the oacc kernels region is already
>>>>>>     split off, we don't add this pair.
>>>>>> - we indicate the parallelization factor by setting the oacc function
>>>>>>     attributes
>>>>>> - we generate an #pragma oacc loop instead of an #pragma omp for, and
>>>>>>     we add the gang clause
>>>>>> - in normal parloops, we rewrite the variable accesses in the loop in
>>>>>>     terms into accesses relative to a thread function parameter.
>>>>>> For the
>>>>>>     oacc kernels region, that rewrite has already been done at
>>>>>> omp-lower,
>>>>>>     so we skip this.
>>>>>> - we need to ensure that the entire kernels region can be run in
>>>>>>     parallel. The loop independence check is already present, so
>>>>>> for oacc
>>>>>>     kernels we add a check between blocks outside the loop and the
>>>>>> entire
>>>>>>     region.
>>>>>> - we guard stores in the blocks outside the loop with gang_pos == 0.
>>>>>>     There's no need for each gang to write to a single location,
>>>>>> we can
>>>>>>     do this in just one gang. (Typically this is the write of the
>>>>>> final
>>>>>>     value of the iteration variable if that one is copied back to the
>>>>>>     host).
>>>>>>
>>>>>
>>>>> Reposting with loop optimizer init added in
>>>>> pass_parallelize_loops_oacc_kernels::execute.
>>>>>
>>>>
>>>> Reposting with loop_optimizer_finalize,scev_initialize and
>>>> scev_finalize
>>>>    added in pass_parallelize_loops_oacc_kernels::execute.
>>>>
>>>
>>> Ping.
>>>
>>> Anything I can do to facilitate the review?
>>
>> Document new functions.
>
> Done.
>
> avoid if (1).
>
> Done.
>
>> Ideally some refactoring would avoid some of the if (!oacc_kernels_p)
>> spaghetti
>
> Ack. For now, i've tried to minimize the number of oacc_kernels_p tests
> in the code.
>
> Further suggestions on how to improve here are much appreciated.
>
>> but I'm considering tree-parloops.c (and its bugs) yours.
>
> Ack.
>
>> Can the pass not just use a pass parameter to switch between
>> oacc/non-oacc?
>>
>
> This patch introduces the pass parameter oacc_kernels_p (but does not
> instantiate an oacc_kernels_p == true pass version yet).

This patch add pass_parallelize_loops to pass_oacc_kernels (using pass 
parameter oacc_kernels_p == true).

As a consequence, it needs to update parloops testcases to use dumpfile 
parloops2.

Bootstrapped and reg-tested on x86_64.

Build with nvidia accelerator and tested goacc.exp and libgomp.

Committed to trunk.

Thanks,
- Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Add-pass_parallelize_loops-to-pass_oacc_kernels.patch
Type: text/x-patch
Size: 41223 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20160118/6c3f9e5a/attachment.bin>


More information about the Gcc-patches mailing list