[PATCH] Reduction support for parloop, OMP_ATOMIC Changes

Zdenek Dvorak rakdver@kam.mff.cuni.cz
Sun Sep 30 22:51:00 GMT 2007


Hello,

Overall, the patch looks fine to me.  You will of course need a separate
review for the changes outside of loop optimizer.  It might make things
easier to submit the OMP expansion changes for review separately.

The patch does not seem to reflect the changes made to the
autoparallelization when it was merged to mainline (it reverts us to set
TREE_THIS_VOLATILE (*arg_struct), and few more things I point out
below); please go over the patch and check for the similar merge
conflicts that I may have overlooked.

> 	* testsite/gcc.dg/autopar: New directory.
> 	* testsite/gcc.dg/autopar/autopar.exp: New driver.
> 	* testsite/gcc.dg/autopar/reduc-1.c: New test.

I am not sure whether we want to create a new directory for the
parallelization testcases -- we certainly do not do that for every
optimization.

> *************** Software Foundation, 51 Franklin Street,
> *** 61,70 ****
> --- 62,296 ----
>      -- handling of common scalar dependence patterns (accumulation, ...)
>      -- handling of non-innermost loops  */
>   
> + /*  
> +   Reduction handling:
> +   currently we use vect_is_simple_reduction() to detect reduction patterns.
> +   The code transformation will be introduced by an example.

could you please make the example easier to understand?  The code dump
you provided is very hard to follow, and contains a lot of irrelevant
information.  Show the resulting code before OMP is expanded (instead of
the current fully expanded form).

> + #two new variables are created for each reduction: 
> + "reduction" is the variable holding the proper init
> + value related to this reduction operation.

Explain what you mean by "proper init value" (I guess you mean
neutral element for the particular operation, e.g. 0 for PLUS_EXPR,
1 for MULT_EXPR, etc).

> +   # Adding this reduction phi is done at 
> +     create_phi_for_local_result() #

Reorganize the example here so that the exit basic block is after the
loop body (in fact, we should generate the code that way, I will try to
fix that).

> +       if (simple_loop_info)
> +         reduc_stmt = vect_is_simple_reduction (simple_loop_info, phi);
> + 	
> +       /* ??? Handle only PLUS reductions for now.  

Remove the bogus comment.

> + 	  new_reduction =
> + 	    (struct reduction_info *) xcalloc (1,
> + 					       sizeof (struct
> + 						       reduction_info));

new_reduction = XCNEW (struct reduction_info);

> *************** loop_parallel_p (struct loop *loop, stru
> *** 149,158 ****
>         if (is_gimple_reg (def)
>   	  && !simple_iv (loop, phi, def, &iv, true))
>   	{
> ! 	  if (dump_file && (dump_flags & TDF_DETAILS))
> ! 	    fprintf (dump_file,
> ! 		     "  FAILED: scalar dependency between iterations\n");
> ! 	  return false;
>   	}
>       }
>   
> --- 470,492 ----
>         if (is_gimple_reg (def)
>   	  && !simple_iv (loop, phi, def, &iv, true))
>   	{
> ! 	  struct reduction_info tmpred, *red;
> ! 
> ! 	  if (!reduction_list)
> ! 	    {
> ! 	      if (dump_file && (dump_flags & TDF_DETAILS))
> ! 		fprintf (dump_file, "  FAILED: non-biv phi node\n");
> ! 	      return false;
> ! 	    }
> ! 	  /* There is recution list.  Check if this is a reduction_phi.  */
> ! 	  tmpred.reduc_phi = phi;
> ! 	  red = htab_find (reduction_list, &tmpred);
> ! 	  if (red == NULL)
> ! 	    {
> ! 	      if (dump_file && (dump_flags & TDF_DETAILS))
> ! 		fprintf (dump_file, "  FAILED: non-biv phi node\n");

Please keep the form of the comment (scalar dependency between
iterations, instead of "non-biv phi node")

>     bool changed;
>   };
>   
> + /* Callback for htab_traverse.  Create the initialization statement
> +    for reduction described in SLOT, and place it at the preheader of 
> +    the loop described in DATA.  */
> + 
> + static int
> + initialize_reductions (void **slot, void *data)

Move this function before the declaration of struct elv_data (and the
comment describing eliminate_local_variables_1).

> +   /* Create a new variable to initialize the reduction.  */
> +   type = TREE_TYPE (PHI_RESULT (reduc->reduc_phi));
> +   bvar = create_tmp_var (type, "reduction");
> +   add_referenced_var (bvar);
> +   /*bvar = SSA_NAME_VAR (TREE_OPERAND (reduc->reduc_stmt,0)); */

Remove this commented out code.

> +   c = build_omp_clause (OMP_CLAUSE_REDUCTION);
> +   OMP_CLAUSE_REDUCTION_CODE (c) = /*PLUS_EXPR */ reduc->reduction_code;

Remove the commented out PLUS_EXPR.

> +   for (i = 0, n = PHI_NUM_ARGS (reduc->reduc_phi); i < n; ++i)
> +     {
> +       tree arg;
> + 
> +       if (!flow_bb_inside_loop_p
> + 	  (loop, PHI_ARG_EDGE (reduc->reduc_phi, i)->src))
> + 	{
> + 	  arg = PHI_ARG_DEF (reduc->reduc_phi, i);

Use

e = loop_preheader_edge (loop);
arg = PHI_ARG_DEF_FROM_EDGE ((reduc->reduc_phi, e);
...

instead of this loop.

> + 	  /* Create new  variable to hold the initial value.  */
> + 	  type = TREE_TYPE (bvar);
> + 	  bvar = create_tmp_var (type, "reduction_initial");
> + 	  add_referenced_var (bvar);
> + 
> +           stmt = build_gimple_modify_stmt (bvar, arg);

Indentation.

> + 	  name1 = make_ssa_name (bvar, stmt);
> + 	  GIMPLE_STMT_OPERAND (stmt, 0) = name1;
> + 	  SSA_NAME_DEF_STMT (name1) = stmt;
> + 	  bb = loop_preheader_edge (loop)->src;
> + 	  bsi = bsi_after_labels (bb);
> + 	  bsi_insert_before (&bsi, stmt, BSI_NEW_STMT);
> + 	  bsi_insert_before (&bsi, t, BSI_NEW_STMT);

It is not safe to insert stmt here, as ARG could be defined in the
preheader.  Use

bsi_insert_on_edge_immediate (e, stmt);
bsi_insert_on_edge_immediate (e, t);

instead.

> *************** add_field_for_name (void **slot, void *d
> *** 474,479 ****
> --- 890,911 ----
>   
>     insert_field_into_struct (type, field);
>     elt->field = field;
> + 
> +   /* Find uses of name to locate if this name is related to 
> +      the reduction phi, and if so, record the filed in the reduction struct.  */

the reduction phi -> a reduction phi
filed -> field

> + /* Since thread computes its own result (local result),
> +    this function creates a phi node whose result is the 
> +    thread's local result.  */

Mention this is a callback for htab_traverse, called for members of the
reduction_list.  Remove the "Since thread computes its own result (local
result)," part of the comment and explain what you mean by "local
result".

> + static int
> + create_phi_for_local_result (void **slot, void *data)
> + {
> +   struct reduction_info *reduc = *slot;
> +   struct loop *loop = data;
> +   edge e;
> +   tree new_phi;
> +   basic_block store_bb;
> +   tree local_res;
> + 
> +   store_bb = FALLTHRU_EDGE (loop->latch)->dest;

This looks wrong.  Under normal circumstances, FALLTHRU_EDGE
(loop->latch)->dest == loop->header, which does not seem to be the case
here.  I assume that we are now in some temporary state where we
inserted the exit condition to the loop latch.  But in that case,
loop latch will have two outgoing edges (to the loop header and
the exit), and neither of them will be marked with EDGE_FALLTHRU flag.

> +   if (EDGE_PRED (store_bb, 0) == FALLTHRU_EDGE (loop->latch))
> +     e = EDGE_PRED (store_bb, 1);
> +   else
> +     e = EDGE_PRED (store_bb, 0);

In any case, add comment explaining what STORE_BB and E is.

> +   e = split_block (bb, t);
> +   new_bb = e->dest;
> +   single_succ_edge (new_bb)->flags = EDGE_FALLTHRU;

No need to set the EDGE_FALLTHRU flag, it is there anyway
(as it is a single successor of NEW_BB).

> +   /*if (current_loops)
> + 	{
> +           remove_bb_from_loops (e->dest);
> + 	  add_bb_to_loop (e->dest, single_succ_edge (new_bb)->dest->loop_father);
> + 	}*/

Remove the commented out code.

> +   bsi_insert_after (&bsi, load,BSI_NEW_STMT);

Space after load,

> +   e = split_block (bsi.bb, load);

Do not access fields of bsi (bsi.bb).

> +   new_bb = e->dest;
> +   single_succ_edge (new_bb)->flags = EDGE_FALLTHRU;

No need to set EDGE_FALLTHRU flag.

> +   /*if (current_loops)
> +         {
> +           remove_bb_from_loops (e->dest);
> +           add_bb_to_loop (e->dest, single_succ_edge (new_bb)->dest->loop_father);
> +         }*/

Remove the commented out code.

> +   bsi = bsi_start (new_bb); 
> +   ref = tmp_load;
> +   x =
> +     fold_build2 (reduc->reduction_code, TREE_TYPE (PHI_RESULT (reduc->new_phi)), ref, PHI_RESULT (reduc->new_phi));

Too long line.

> +   tmp_store = create_tmp_var (TREE_TYPE (PHI_RESULT (reduc->new_phi)), NULL);
> +   add_referenced_var (tmp_store);
> +   stmt = build_gimple_modify_stmt (tmp_store, x);
> +   name = make_ssa_name (tmp_store, stmt);
> +   GIMPLE_STMT_OPERAND (stmt, 0) = name;
> +   SSA_NAME_DEF_STMT (name) = stmt;
> +   bsi_insert_after (&bsi, stmt, BSI_NEW_STMT);

name = force_gimple_operand_bsi (&bsi, x, ...);

would be simpler.

> +   x= build1 (OMP_ATOMIC_STORE, void_type_node, name);

Space before =.

> + /* Create the atomic operation at the join point of the threads.  */
> + static void
> + create_call_for_reduction (struct loop *loop, struct clsn_data *clsn_data)
> + {
> +   htab_traverse (reduction_list, create_phi_for_local_result, loop);
> +   clsn_data->load_bb = FALLTHRU_EDGE (loop->latch)->dest;

Again, FALLTHRU_EDGE (loop->latch) seems wrong.

> +   x =
> +     fold_build2 (red->reduction_code, TREE_TYPE (load_struct),
> + 		 name, red->initial_value);
> +   stmt = build_gimple_modify_stmt (bvar, x);
> +   name = make_ssa_name (bvar, stmt);

name = PHI_RESULT (red->keep_res);

> +   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, PHI_RESULT (red->keep_res))
> +     FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) 
> +       SET_USE (use_p, name);

And remove this loop.

> +   SET_PHI_RESULT (red->keep_res, NULL_TREE);

This statement is not necessary.

> + static void
> + create_final_loads_for_reduction (struct clsn_data *clsn_data)
> + {

Add comment describing this function.

> +   block_stmt_iterator bsi;
> +   tree t;
> + 
> +   bsi = bsi_after_labels (clsn_data->load_bb);
> +   t = build_fold_addr_expr (clsn_data->store);
> +   t = build_gimple_modify_stmt (clsn_data->load,build_fold_addr_expr (clsn_data->store));

space after comma

>   static int
>   create_loads_and_stores_for_name (void **slot, void *data)
>   {
> *************** create_loads_and_stores_for_name (void *
> *** 552,559 ****
>      */
>   
>   static void
> ! separate_decls_in_loop (struct loop *loop, tree *arg_struct,
> ! 			tree *new_arg_struct)
>   {
>     basic_block bb1 = split_edge (loop_preheader_edge (loop));
>     basic_block bb0 = single_pred (bb1);
> --- 1177,1184 ----
>      */
>   
>   static void
> ! separate_decls_in_loop (struct loop *loop, tree * arg_struct,
> ! 			tree * new_arg_struct, struct clsn_data *clsn_data)
>   {
>     basic_block bb1 = split_edge (loop_preheader_edge (loop));
>     basic_block bb0 = single_pred (bb1);

Why are you passing clsn_data do separate_decls_in_loop?  Explain
it in the comments before the function.  Also, give the argument some
senseful name, CLSN_DATA just does not mean anything here.

> *************** separate_decls_in_loop (struct loop *loo
> *** 565,571 ****
>     unsigned i;
>     tree phi, type, type_name, nvar;
>     block_stmt_iterator bsi;
> !   struct clsn_data clsn_data;
>   
>     /* Find and rename the ssa names defined outside of loop.  */
>     for (i = 0; i < loop->num_nodes; i++)
> --- 1190,1196 ----
>     unsigned i;
>     tree phi, type, type_name, nvar;
>     block_stmt_iterator bsi;
> !   struct clsn_data clsn_data1;

You will also avoid the need to rename this variable.

>     /* Find and rename the ssa names defined outside of loop.  */
>     for (i = 0; i < loop->num_nodes; i++)
> *************** separate_decls_in_loop (struct loop *loo
> *** 606,617 ****
>         add_referenced_var (nvar);
>         *new_arg_struct = make_ssa_name (nvar, NULL_TREE);
>   
> !       clsn_data.store = *arg_struct;
> !       clsn_data.load = *new_arg_struct;
> !       clsn_data.store_bb = bb0;
> !       clsn_data.load_bb = bb1;
>         htab_traverse (name_copies, create_loads_and_stores_for_name,
> ! 		     &clsn_data);
>       }
>   
>     htab_delete (decl_copies);
> --- 1231,1257 ----
>         add_referenced_var (nvar);
>         *new_arg_struct = make_ssa_name (nvar, NULL_TREE);
>   
> !       /* We should mark *arg_struct call clobbered.  However, this means
> ! 		  we would need to update virtual operands for every function call.
> ! 		  To avoid this, we pretend this variable is volatile.  */
> !       TREE_THIS_VOLATILE (*arg_struct) = 1;
> ! 

Remove this statement and comment.


>         res = PHI_RESULT (phi);
>   
>         if (!is_gimple_reg (res)
> ! 		   || res == var_before)
> ! 		 {
> ! 		   prev = phi;
> ! 		   continue;
> ! 		 }

Indentation.

> !       ok = simple_iv (loop, phi, res, &iv, true);
> !       if (!ok)

I would prefer to check here whether phi is in the reductions_list
(and add comment that we preserve the reduction phi nodes),
and keep the original code (asserting that all the remaining
phis are induction variables) unchanged.

> *************** transform_to_exit_first_loop (struct loo
> *** 834,845 ****
>         if (!is_gimple_reg (res))
>   	continue;

Update the comment before this section of the code stating
that "The only gimple reg that should be copied out of the loop is the
control variable." to mention that also reductions are copied out.

> !       gcc_assert (control_name == NULL_TREE
> ! 		  && SSA_NAME_VAR (res) == SSA_NAME_VAR (control));
>         control_name = res;
>       }
>     gcc_assert (control_name != NULL_TREE);
>     phi = SSA_NAME_DEF_STMT (control_name);
>     remove_phi_node (phi, NULL_TREE, false);
>   
>     /* Initialize the control variable to NIT.  */
> --- 1477,1509 ----
>         if (!is_gimple_reg (res))
>   	continue;
>   
> !       /* check if it is a part of reduction.  If it is,
> !          keep the phi at the reduction's keep_res field.  The  
> !          PHI_RESULT of this phi is the resulting value of the reduction 
> !          variable when exiting the loop.  */

Capital C at the start of the sentence.

>     phi = SSA_NAME_DEF_STMT (control_name);
> +   SET_PHI_RESULT (phi, NULL_TREE);

Do not add this (unnecessary) statement.

> *************** gen_parallel_loop (struct loop *loop, un
> *** 985,990 ****
> --- 1649,1655 ----
>     tree many_iterations_cond, type, nit;
>     tree stmts, arg_struct, new_arg_struct;
>     basic_block parallel_head;
> +   struct clsn_data clsn_data;

Give this variable some senseful name (possibly the same as the argument to
separate_decls_in_loop).

> *************** parallelize_loops (void)
> *** 1125,1130 ****
> --- 1797,1803 ----
>   
>     FOR_EACH_LOOP (li, loop, 0)
>       {
> +       reduction_list = NULL;

This leaks reduction_list (you need to htab_delete the hash tables).
However, it would anyway simplify the code a lot if you allocated
reduction_list in parallelize_loops, just htab_cleared it here,
and deallocated it at the end of parallelize_loops.  On many places,
you will no longer need to special-case reduction_list==NULL
(and you may check whether the hash table is empty on those few places
where you really need it).

Also, it would be more consistent with the way tree-parloops is
structured now to pass reduction_list in arguments, rather than
making it a global variable.

Zdenek



More information about the Gcc-patches mailing list