This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
- From: Richard Biener <richard dot guenther at gmail dot com>
- To: Tom de Vries <Tom_deVries at mentor dot com>
- Cc: "gcc-patches at gnu dot org" <gcc-patches at gnu dot org>
- Date: Tue, 28 Jul 2015 12:11:29 +0200
- Subject: Re: [PATCH, PR66846] Mark inner loop for fixup in parloops
- Authentication-results: sourceware.org; auth=none
- References: <55A6B656 dot 8080701 at mentor dot com> <CAFiYyc0tmNaQoxTWH5DV1fkCz2h3hDerCM13pUWNti6A+5XCEA at mail dot gmail dot com> <55A77BEE dot 1050906 at mentor dot com> <CAFiYyc1DXSmrHELnsNkPMmyaN+XuXhFksOwXmXR-GVvc-wfyhg at mail dot gmail dot com> <55ACF1DA dot 7000102 at mentor dot com> <55B20F2D dot 3060403 at mentor dot com>
On Fri, Jul 24, 2015 at 12:10 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 20/07/15 15:04, Tom de Vries wrote:
>>
>> On 16/07/15 12:15, Richard Biener wrote:
>>>
>>> On Thu, Jul 16, 2015 at 11:39 AM, Tom de Vries
>>> <Tom_deVries@mentor.com> wrote:
>>>>
>>>> On 16/07/15 10:44, Richard Biener wrote:
>>>>>
>>>>>
>>>>> On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I.
>>>>>>
>>>>>> In openmp expansion of loops, we do some effort to try to create
>>>>>> matching
>>>>>> loops in the loop state of the child function, f.i.in
>>>>>> expand_omp_for_generic:
>>>>>> ...
>>>>>> struct loop *outer_loop;
>>>>>> if (seq_loop)
>>>>>> outer_loop = l0_bb->loop_father;
>>>>>> else
>>>>>> {
>>>>>> outer_loop = alloc_loop ();
>>>>>> outer_loop->header = l0_bb;
>>>>>> outer_loop->latch = l2_bb;
>>>>>> add_loop (outer_loop, l0_bb->loop_father);
>>>>>> }
>>>>>>
>>>>>> if (!gimple_omp_for_combined_p (fd->for_stmt))
>>>>>> {
>>>>>> struct loop *loop = alloc_loop ();
>>>>>> loop->header = l1_bb;
>>>>>> /* The loop may have multiple latches. */
>>>>>> add_loop (loop, outer_loop);
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> And if that doesn't work out, we try to mark the loop state for
>>>>>> fixup, in
>>>>>> expand_omp_taskreg and expand_omp_target:
>>>>>> ...
>>>>>> /* When the OMP expansion process cannot guarantee an
>>>>>> up-to-date
>>>>>> loop tree arrange for the child function to fixup
>>>>>> loops. */
>>>>>> if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
>>>>>> child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
>>>>>> ...
>>>>>>
>>>>>> and expand_omp_for:
>>>>>> ...
>>>>>> else
>>>>>> /* If there isn't a continue then this is a degerate case where
>>>>>> the introduction of abnormal edges during lowering will
>>>>>> prevent
>>>>>> original loops from being detected. Fix that up. */
>>>>>> loops_state_set (LOOPS_NEED_FIXUP);
>>>>>> ...
>>>>>>
>>>>>> However, loops are fixed up anyway, because the first pass we execute
>>>>>> with
>>>>>> the new child function is pass_fixup_cfg.
>>>>>>
>>>>>> The new child function contains a function call to
>>>>>> __builtin_omp_get_num_threads, which is marked with ECF_CONST, so
>>>>>> execute_fixup_cfg marks the function for TODO_cleanup_cfg, and
>>>>>> subsequently
>>>>>> the loops with LOOPS_NEED_FIXUP.
>>>>>>
>>>>>>
>>>>>> II.
>>>>>>
>>>>>> This patch adds a verification that at the end of the omp-expand
>>>>>> processing
>>>>>> of the child function, either the loop structure is ok, or marked for
>>>>>> fixup.
>>>>>>
>>>>>> This verfication triggered a failure in parloops. When an outer
>>>>>> loop is
>>>>>> being parallelized, both the outer and inner loop are cancelled. Then
>>>>>> during
>>>>>> omp-expansion, we create a loop in the loop state for the outer
>>>>>> loop (the
>>>>>> one that is transformed), but not for the inner, which causes the
>>>>>> verification failure:
>>>>>> ...
>>>>>> outer-1.c:11:3: error: loop with header 5 not in loop tree
>>>>>> ...
>>>>>>
>>>>>> [ I ran into this verification failure with an openacc kernels
>>>>>> testcase
>>>>>> on
>>>>>> the gomp-4_0-branch, where parloops is called additionally from a
>>>>>> different
>>>>>> location, and pass_fixup_cfg is not the first pass that the child
>>>>>> function
>>>>>> is processed by. ]
>>>>>>
>>>>>> The patch contains a bit that makes sure that the loop state of the
>>>>>> child
>>>>>> function is marked for fixup in parloops. The bit is non-trival
>>>>>> since it
>>>>>> create a loop state and sets the fixup flag on the loop state, but
>>>>>> postpones
>>>>>> the init_loops_structure call till move_sese_region_to_fn, where it
>>>>>> can
>>>>>> succeed.
>>>>>>
>>>>>>
>>
>> <SNIP>
>>
>>> Can we fix the root-cause of the issue instead? That is, build a
>>> valid loop
>>> structure in the first place?
>>>
>>
>> This patch manages to keep the loop structure, that is, to not cancel
>> the loop tree in parloops, and guarantee a valid loop structure at the
>> end of parloops.
>>
>> The transformation to insert the omp_for invalidates the loop state
>> properties LOOPS_HAVE_RECORDED_EXITS and LOOPS_HAVE_SIMPLE_LATCHES, so
>> we drop those in parloops.
>>
>> In expand_omp_for_static_nochunk, we detect the existing loop struct of
>> the omp_for, and keep it.
>>
>> Then by calling pass_tree_loop_init after pass_expand_omp_ssa, we get
>> the loop state properties LOOPS_HAVE_RECORDED_EXITS and
>> LOOPS_HAVE_SIMPLE_LATCHES back.
>>
>
> This updated patch tries a more minimal approach.
>
> Rather than dropping property LOOPS_HAVE_RECORDED_EXITS, we record the new
> exit instead.
>
> And rather than adding pass_tree_loop_init after pass_expand_omp_ssa, we
> just set LOOPS_HAVE_SIMPLE_LATCHES back at the end of pass_expand_omp_ssa.
>
> Bootstrapped and reg-tested on x86_64.
>
> OK for trunk?
I wonder about the need to clear LOOPS_HAVE_SIMPLE_LATCHES (and esp. turning
that back on in execute_expand_omp). The parloops code lacks comments and
the /* Prepare cfg. */ part looks twisty to me - but I don't see why
it should be difficult
to preserve simple latches as well - is this just because we insert
the GIMPLE_OMP_CONTINUE
in it?
If execute_expand_omp is not performed in a loop pipeline where things
expect simple latches
(well, obviously it isn't) then why set LOOPS_HAVE_SIMPLE_LATCHES here
at all? If somebody
needs it it will just request simple latches.
So if the GIMPLE_OMP_CONTINUE part is correct then I'd prefer to keep
LOOPS_HAVE_SIMPLE_LATCHES
unset in execute_expand_omp.
Ok with that change.
Thanks,
Richard.
> Thanks,
> - Tom
>