This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH, PR66846] Mark inner loop for fixup in parloops


On 16/07/15 10:44, Richard Biener wrote:
On Wed, Jul 15, 2015 at 9:36 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
Hi,

I.

In openmp expansion of loops, we do some effort to try to create matching
loops in the loop state of the child function, f.i.in
expand_omp_for_generic:
...
       struct loop *outer_loop;
       if (seq_loop)
         outer_loop = l0_bb->loop_father;
       else
         {
           outer_loop = alloc_loop ();
           outer_loop->header = l0_bb;
           outer_loop->latch = l2_bb;
           add_loop (outer_loop, l0_bb->loop_father);
         }

       if (!gimple_omp_for_combined_p (fd->for_stmt))
         {
           struct loop *loop = alloc_loop ();
           loop->header = l1_bb;
           /* The loop may have multiple latches.  */
           add_loop (loop, outer_loop);
         }
...

And if that doesn't work out, we try to mark the loop state for fixup, in
expand_omp_taskreg and expand_omp_target:
...
       /* When the OMP expansion process cannot guarantee an up-to-date
          loop tree arrange for the child function to fixup loops.  */
       if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
         child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP;
...

and expand_omp_for:
...
   else
     /* If there isn't a continue then this is a degerate case where
        the introduction of abnormal edges during lowering will prevent
        original loops from being detected.  Fix that up.  */
     loops_state_set (LOOPS_NEED_FIXUP);
...

However, loops are fixed up anyway, because the first pass we execute with
the new child function is pass_fixup_cfg.

The new child function contains a function call to
__builtin_omp_get_num_threads, which is marked with ECF_CONST, so
execute_fixup_cfg marks the function for TODO_cleanup_cfg, and subsequently
the loops with LOOPS_NEED_FIXUP.


II.

This patch adds a verification that at the end of the omp-expand processing
of the child function, either the loop structure is ok, or marked for fixup.

This verfication triggered a failure in parloops. When an outer loop is
being parallelized, both the outer and inner loop are cancelled. Then during
omp-expansion, we create a loop in the loop state for the outer loop (the
one that is transformed), but not for the inner, which causes the
verification failure:
...
outer-1.c:11:3: error: loop with header 5 not in loop tree
...

[ I ran into this verification failure with an openacc kernels testcase on
the gomp-4_0-branch, where parloops is called additionally from a different
location, and pass_fixup_cfg is not the first pass that the child function
is processed by. ]

The patch contains a bit that makes sure that the loop state of the child
function is marked for fixup in parloops. The bit is non-trival since it
create a loop state and sets the fixup flag on the loop state, but postpones
the init_loops_structure call till move_sese_region_to_fn, where it can
succeed.


III.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

+  struct loops *loops;
+  int loop_state_flags = 0;
+  if (dest_cfun->x_current_loops == NULL)
+    {
+      /* Initialize an empty loop tree.  */
+      loops = ggc_cleared_alloc<struct loops> ();
+      set_loops_for_fn (dest_cfun, loops);
+    }
+  else
+    {
+      loops = dest_cfun->x_current_loops;
+      loop_state_flags = loops->state;
+    }
+
+  if (loops->tree_root == NULL)
+    {
+      init_loops_structure (dest_cfun, loops, 1);
+      loops->state |= loop_state_flags;
+    }
+
+  loops->state |= LOOPS_MAY_HAVE_MULTIPLE_LATCHES;

this looks twisted just because you do a half-way init of the loop tree here:

+  if (loop->inner)
+    {
+      struct function *child_cfun = DECL_STRUCT_FUNCTION (child_fn);
+      struct loops *loops = ggc_cleared_alloc<struct loops> ();
+      set_loops_for_fn (child_cfun, loops);
+      child_cfun->x_current_loops->state = LOOPS_NEED_FIXUP;
+    }

why not unconditionally initialize the loop tree properly in autopar?


Because init_loops_structure accesses f.i. n_basic_blocks_for_fn (child_cfun), in other words child_cfun->cfg->x_n_basic_blocks. At this point in parloops, there's no child_cfun->cfg yet, that field is set by the following pass_expand_omp_ssa.

Normally, the solution is to do loops_state_set (LOOPS_NEED_FIXUP) for the parent function, which will get propagated to the child. But I'm trying to be more precise than that, by only setting LOOPS_NEED_FIXUP for the child, not the parent.

Thanks,
- Tom


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]