[committed] Add pass_parallelize_loops to pass_oacc_kernels
Tom de Vries
Tom_deVries@mentor.com
Mon Jan 18 13:30:00 GMT 2016
[ was: Re: [committed] Add oacc_kernels_p argument to
pass_parallelize_loops ]
On 18/01/16 14:07, Tom de Vries wrote:
> [was: Re: [PIING][PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels ]
>
> On 14/12/15 16:22, Richard Biener wrote:
>> On Sun, Dec 13, 2015 at 5:58 PM, Tom de Vries <Tom_deVries@mentor.com>
>> wrote:
>>> On 24/11/15 13:24, Tom de Vries wrote:
>>>>
>>>> On 16/11/15 12:59, Tom de Vries wrote:
>>>>>
>>>>> On 09/11/15 20:52, Tom de Vries wrote:
>>>>>>
>>>>>> On 09/11/15 16:35, Tom de Vries wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> this patch series for stage1 trunk adds support to:
>>>>>>> - parallelize oacc kernels regions using parloops, and
>>>>>>> - map the loops onto the oacc gang dimension.
>>>>>>>
>>>>>>> The patch series contains these patches:
>>>>>>>
>>>>>>> 1 Insert new exit block only when needed in
>>>>>>> transform_to_exit_first_loop_alt
>>>>>>> 2 Make create_parallel_loop return void
>>>>>>> 3 Ignore reduction clause on kernels directive
>>>>>>> 4 Implement -foffload-alias
>>>>>>> 5 Add in_oacc_kernels_region in struct loop
>>>>>>> 6 Add pass_oacc_kernels
>>>>>>> 7 Add pass_dominator_oacc_kernels
>>>>>>> 8 Add pass_ch_oacc_kernels
>>>>>>> 9 Add pass_parallelize_loops_oacc_kernels
>>>>>>> 10 Add pass_oacc_kernels pass group in passes.def
>>>>>>> 11 Update testcases after adding kernels pass group
>>>>>>> 12 Handle acc loop directive
>>>>>>> 13 Add c-c++-common/goacc/kernels-*.c
>>>>>>> 14 Add gfortran.dg/goacc/kernels-*.f95
>>>>>>> 15 Add libgomp.oacc-c-c++-common/kernels-*.c
>>>>>>> 16 Add libgomp.oacc-fortran/kernels-*.f95
>>>>>>>
>>>>>>> The first 9 patches are more or less independent, but patches
>>>>>>> 10-16 are
>>>>>>> intended to be committed at the same time.
>>>>>>>
>>>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>>>
>>>>>>> Build and reg-tested with nvidia accelerator, in combination with a
>>>>>>> patch that enables accelerator testing (which is submitted at
>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
>>>>>>>
>>>>>>> I'll post the individual patches in reply to this message.
>>>>>>
>>>>>>
>>>>>> This patch adds pass_parallelize_loops_oacc_kernels.
>>>>>>
>>>>>> There's a number of things we do differently in parloops for oacc
>>>>>> kernels:
>>>>>> - in normal parloops, we generate code to choose between a parallel
>>>>>> version of the loop, and a sequential (low iteration count)
>>>>>> version.
>>>>>> Since the code in oacc kernels region is supposed to run on the
>>>>>> accelerator anyway, we skip this check, and don't add a low
>>>>>> iteration
>>>>>> count loop.
>>>>>> - in normal parloops, we generate an #pragma omp parallel /
>>>>>> GIMPLE_OMP_RETURN pair to delimit the region which will we
>>>>>> split off
>>>>>> into a thread function. Since the oacc kernels region is already
>>>>>> split off, we don't add this pair.
>>>>>> - we indicate the parallelization factor by setting the oacc function
>>>>>> attributes
>>>>>> - we generate an #pragma oacc loop instead of an #pragma omp for, and
>>>>>> we add the gang clause
>>>>>> - in normal parloops, we rewrite the variable accesses in the loop in
>>>>>> terms into accesses relative to a thread function parameter.
>>>>>> For the
>>>>>> oacc kernels region, that rewrite has already been done at
>>>>>> omp-lower,
>>>>>> so we skip this.
>>>>>> - we need to ensure that the entire kernels region can be run in
>>>>>> parallel. The loop independence check is already present, so
>>>>>> for oacc
>>>>>> kernels we add a check between blocks outside the loop and the
>>>>>> entire
>>>>>> region.
>>>>>> - we guard stores in the blocks outside the loop with gang_pos == 0.
>>>>>> There's no need for each gang to write to a single location,
>>>>>> we can
>>>>>> do this in just one gang. (Typically this is the write of the
>>>>>> final
>>>>>> value of the iteration variable if that one is copied back to the
>>>>>> host).
>>>>>>
>>>>>
>>>>> Reposting with loop optimizer init added in
>>>>> pass_parallelize_loops_oacc_kernels::execute.
>>>>>
>>>>
>>>> Reposting with loop_optimizer_finalize,scev_initialize and
>>>> scev_finalize
>>>> added in pass_parallelize_loops_oacc_kernels::execute.
>>>>
>>>
>>> Ping.
>>>
>>> Anything I can do to facilitate the review?
>>
>> Document new functions.
>
> Done.
>
> avoid if (1).
>
> Done.
>
>> Ideally some refactoring would avoid some of the if (!oacc_kernels_p)
>> spaghetti
>
> Ack. For now, i've tried to minimize the number of oacc_kernels_p tests
> in the code.
>
> Further suggestions on how to improve here are much appreciated.
>
>> but I'm considering tree-parloops.c (and its bugs) yours.
>
> Ack.
>
>> Can the pass not just use a pass parameter to switch between
>> oacc/non-oacc?
>>
>
> This patch introduces the pass parameter oacc_kernels_p (but does not
> instantiate an oacc_kernels_p == true pass version yet).
This patch add pass_parallelize_loops to pass_oacc_kernels (using pass
parameter oacc_kernels_p == true).
As a consequence, it needs to update parloops testcases to use dumpfile
parloops2.
Bootstrapped and reg-tested on x86_64.
Build with nvidia accelerator and tested goacc.exp and libgomp.
Committed to trunk.
Thanks,
- Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Add-pass_parallelize_loops-to-pass_oacc_kernels.patch
Type: text/x-patch
Size: 41223 bytes
Desc: not available
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20160118/6c3f9e5a/attachment.bin>
More information about the Gcc-patches
mailing list