struct-reorg optimization

Thu Sep 6 21:04:00 GMT 2007

On 8/31/07, Olga Golovanevsky <OLGA@il.ibm.com> wrote:

>    Contributed by Olga Golovanevsky <olga@il.ibm.com>
>    (initial version of this code was developed
>                                   by Caroline Tice and Mostafa Hagog)
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Spacing.

> and str_t_1 as it was shown above, then array A will be replaced
> by two arrays as foolows:

s/foolows/follows/

> The field access of fields a of ellement i of array A: A[i].a will be
> replace by access to field a of ellement i of array A_0: A_0[i].a.

s/ellement/element/
s/replace/replaced/
s/by access/by an access/

> The decision how to peel structure gains to utilize spatial locality.

"The goal of structure peeling is to improve spatial locality."

> For example, if one of fields of structure has intensive use in the loop:

s/if one of/if one of the/
s/of structure/of a structure/
s/has intensive use in the loop/is accessed frequently in the loop:/

>    the allocation of field a of str_t in the memory in contiguous manner will

s/in the memory in contiguous manner/contiguously in memory/

>    increase probablity of relevant data to be fatched into cache.

"increase the chances of fetching the field from cache."

>    got peeled out of the structure:

s/got/get/

>    freq(f) > C * max_field_freq_in_struct
>
>    where max_field_freq_in_struct is maximum field frequency in structure.

s/is maximum/is the maximum/
s/in structure/in the structure/

>    The C is a constant defining which portion of max_field_freq_in_struct
>    the filds should have in order to be peeled.

s/The C/C/
s/the filds/the fields/

>    If profiling information is provided, its is used to calculate the

s/its is/it is/

>    frequency of field accesses. Otherwise, structure is fully peeled.

/structure is/the structure is/

Oh?  Don't we use estimated execution frequencies to compute this?

>
>    The ipa-type-escape analysis are used to provide safety of the peeling.

"IPA type-escape analysis is used to determine when it is safe to peel
a structure."

>    The optimization is activated by flag -fipa-struct-reorg.
> */

Closing comment on previous line.

> /* Ratio of the structure count to the hottest
>    structure count to consider the structure cold.  */
> #define COLD_STRUCTURE_RATIO 10

Better convert into a --param.

> struct new_var_data {
> [ ... ]
>   struct new_var_data *next;
> };

Why not use a VEC and avoid having to manage the single-linked list
yourself?  Similarly with structures malloc_call, malloc_struct and
struct_list.

> static new_var is_in_new_vars_list (tree, new_var);
> static void add_struct_to_data_struct_list (tree);
> static void add_call_to_malloc_list (tree, tree, d_str);
> static inline void update_fields_mapping (struct field_cluster *, tree,
> 					  struct data_field_entry *, int);
> static void create_new_var (tree, new_var *);
> static tree get_final_malloc_stmt (tree malloc_stmt);
> static tree_list add_tree_to_tree_list (tree, tree_list);
> static void free_tree_list (tree_list);
> static malloc_d get_malloc_list_for_func (tree);
> static struct field_access_site *
> add_field_acc_to_acc_sites (struct field_access_site *,
> 			    struct field_access_site *);
> static struct access_site * add_access_to_acc_sites (tree, tree,
> 						     struct access_site *);
> static bool is_result_of_mult (tree, tree *, tree);
> static void remove_struct_from_data_struct_list (list);
> static tree gen_size (tree, tree, tree *);
> static tree gen_cast_stmt (tree, tree, tree, tree *);
> static void create_new_stmts_for_general_acc (struct access_site *, d_str);
> static void create_new_stmts_for_cond_expr (tree);
> static bool is_equal_types (tree, tree);
> static tree * find_pos_in_stmt (tree, tree);
> static void free_data_structure_list (void);

I'd rather see the functions defined in such a way that you don't need
forward declarations.  This forces the main functions to be grouped at
the bottom of the file, making it easier to find them later.  It also
simplifies changing function signatures in the future.

Not a requirement, though.  That's mostly personal preference.

>   if (TREE_CODE (var) == PARM_DECL)
>       return DECL_ARG_TYPE (var);
>
>   else
>     return TREE_TYPE (var);
> }

Spacing.

> /* Check whether the type of VAR is potential candidate for peeling.
>    Returns true if yes, false otherwise.  If yes, TYPE_P contains
>    candidate type.  */
> static bool
> is_candidate_for_data_struct_list (tree var, tree *type_p,
> 				   bitmap non_suitable_types)

No documentation for NON_SUITABLE_TYPES.

> static const char *
> get_type_name (tree type_node)
< [ ... ]
>   if (TREE_CODE (TYPE_NAME (type_node)) == IDENTIFIER_NODE)
>     return (char *) IDENTIFIER_POINTER (TYPE_NAME (type_node));

s/char *//

>   else if (TREE_CODE (TYPE_NAME (type_node)) == TYPE_DECL
> 	   && DECL_NAME (TYPE_NAME (type_node)))
>     return (char *) IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type_node)));

> /* Print structure type , its name, if it exists, and body.  */
> static void
> print_struct_type (tree type, char *indent)
> {

s/type , /TYPE, /

No documentation for INDENT.

> /* Given structure type SRT_TYPE and field FIELD,
>    this funciton is looking for a field with the same name

s/funciton/function/

>       char * str_field_name;
>       char * field_name;

const char *

>       if (!strcmp (str_field_name, field_name))
> 	{

Why?  Wouldn't is_equal_types() be enough here?

> /* This function is a hack to overcome the types problem.
>    When number of compilation units are compiled together

s/number of/several/

>    under -combine flag, the TYPE_MAIN_VARIANT cannot

s/under -combine flag/with -combine/

>    be trusted. First we comparing names,  but they do not
>    always appear, like for example, in array_refs.

I'm not sure what this means.  Could you re-phrase?

>    Only then we are going inside structures.
> */

Closing comment on previous line.

> static inline struct field_access_site *
> make_field_acc_node (void)
> {
>     return (struct field_access_site *)
>       xcalloc (1, sizeof (struct field_access_site));
> }

Indentation.

>       *walk_subtrees=1;
> [ ... ]
>       *walk_subtrees=0;

Spacing between '='.  Happens elsewhere too.

>       }
>       break;
>     case VAR_DECL:
>     case SSA_NAME:

Spacing after 'break;'.

>       /* In asm stmt we cannot always track the agruments,
>        so we just give up.  */
>       if (TREE_CODE (stmt) == ASM_EXPR)
>       free_data_structure_list ();

s/agruments/arguments/

If we are giving up, why does the loop keep going?

> static tree
> build_basic_struct (tree fields, char * name, tree orig_struct)
> {
>
>   tree attributes = NULL_TREE;

No spacing after '{'.

> gen_cluster_name (tree decl, int clust_num, int str_num)
> {
>   char * old_name = (char *) get_type_name (decl);

Better declare old_name as 'const char *'.

> /* Generate new structure name.  */
> static inline char *
> gen_cluster_name (tree decl, int clust_num, int str_num)
> {
>   char * old_name = (char *) get_type_name (decl);
>   char * new_name;
>
>   if (!old_name)
>     {
>       old_name = (char *) xmalloc ((10) * sizeof (char));
>       sprintf (old_name, "struct_%d", str_num);
>     }
>
>   new_name = (char *) xmalloc ((strlen (old_name) + 11) * sizeof (char));
>   sprintf (new_name, "%s_sub_%d", old_name, clust_num);
>
>   /* Check that this name was not used before.  */
>   while (maybe_get_identifier (new_name))
>     {
>       int len = strlen (new_name) + 3;
>       char * tmp_name = (char *) xmalloc (len * sizeof (char));
>       sprintf (tmp_name, "%s.0", new_name);
>       new_name = tmp_name;
>     }
>
>   return new_name;
> }

No.  Better use alloca and make sure there are no possibility of
buffer overflow.  This function leaks memory as well.  See
create_omp_child_function_name and create_tmp_var_name.

Similarly, in gen_var_name.  Why not use create_tmp_var_name?

> /* Update field_mapping of fields in CLUSTER with NEW_TYPE.  */
> static inline void
> update_fields_mapping (struct field_cluster *cluster, tree new_type,
>                      struct data_field_entry * fields, int num_fields)

No documentation for FIELDS and NUM_FIELDS.

> static struct field_access_site *
> is_in_field_acc_list (tree stmt, struct field_access_site *list_p)
> {
>   struct field_access_site *curr;
>
>   for (curr = list_p; curr; curr = curr->next)
>     if (curr->stmt == stmt)
>       return curr;

Would it make sense to keep a pointer-map or a hashtable to avoid
these linear searches?

> static struct access_site *
> is_in_acc_list (tree stmt, struct access_site *list_p)
> {
>   struct access_site *curr;
>
>   for (curr = list_p; curr; curr = curr->next)
>     if (curr->stmt == stmt)
>       return curr;

Likewise here.

> struct type_wrapper
> {
>   /* 0 stand for pointer wrapper,
>      and 1 for array wrapper.  */
>   bool wrap;
>   tree domain; /* Relevant for arrays as domain or index.  */
>   struct type_wrapper * next;
> };

Similar comment as the other structures.  It seems easier to put these
inside VECs instead of having to manage the single-linked lists on
your own.

> static void
> create_new_local_vars (void)
> {

This function looks like it's doing too much work.  It's going over
the same DECLs over and over again.  Why not just traverse
referenced_vars?

>    is converted into followinf set of stmts:

s/into followinf/into the following/

> static void
> add_bb_info (basic_block bb, tree stmt_list)
> {
>   if (TREE_CODE (stmt_list) == STATEMENT_LIST)
>     {
>       tree_stmt_iterator tsi;
>       for (tsi = tsi_start (stmt_list); !tsi_end_p (tsi);
> 	   tsi_next (&tsi))

tsi_next() fits in the previous line.

> /* This function returns a tree representing
>    the number of structure instances allocated by MALLOC_STMT.  */
> gen_num_of_structs_in_malloc (tree stmt, tree str_decl,
> 			      tree *new_stmts_p)

new_stmts_p fits in the previous line.

Missing documentation for NEW_STMTS_P and STR_DECL.  I suppose that
MALLOC_STMT is STMT?

>   /* Get malloc agrument.  */

s/agrument/argument/

>   /* Generate argument to malloc as multiplication of num and size of new_type.  */

Line's too long.

> /* This function build an edge between BB and E->dest and updates

s/build/builds/

> /* If MALLOC_STMT is D.2225_7 = malloc (D.2224_6);
>    it can be casting stmt as for example:
>    p_8 = (struct str_t *) D.2225_7;
>    which is returned by this function.  */

It's not clear what this means.  Could you rephrase?  Perhaps
something like "Return the first statement that uses the result of the
call to MALLOC_STMT."

>   return build3 (COMPONENT_REF, TREE_TYPE (field),
> 		 base, field, NULL_TREE);

No need to wrap this line.

>   struct type_wrapper * wrapper = NULL;
>   struct type_wrapper * wr = NULL;

No space after '*'.

>   if (TREE_CODE (acc->stmt) == GIMPLE_MODIFY_STMT)
>     {
>
>       lhs = GIMPLE_STMT_OPERAND (acc->stmt, 0);

No space after '{'

> /* In this function we create new stmts that has the same
>    form as ORIG_STMT, but of the type NET_TYPE. The stmts

s/stmts/statements/
s/that has/that have/
s/of the type/of type/

>    treated by this function are simple assignments,
>    like assignments:  p.8_7 = p; or stmts with rhs of code
>    PLUS_EXPR or MINUS_EXPR.  */
> static tree
> create_base_plus_offset (tree orig_stmt, tree new_type,
> 			 tree offset)
> {
> [ ... ]
> 	gcc_assert ( tmp0 || tmp1);

No space after '('.

> 	if (!new_op0)
> 	  new_op0 = offset;
> 	if (!new_op1)
> 	  new_op1 = offset;
>
> 	new_rhs = build2 (TREE_CODE (rhs), TREE_TYPE (new_op0),
> 			  new_op0, new_op1);

If RHS is a POINTER_PLUS_EXPR and new_op0 is OFFSET, you'll likely
create an invalid expression here.

>   new_stmt = build2 (GIMPLE_MODIFY_STMT, TREE_TYPE (new_lhs),
> 		     new_lhs, new_rhs);

Statements have no type.  That should be void_type_node.  Similarly in
other places where assignments are created.

> static tree
> gen_cast_stmt (tree before_cast, tree new_type, tree old_cast_stmt, tree *res_p)

This line is too long.

> static void
> create_new_field_access (struct field_access_site *acc, struct data_field_entry field)

Likewise.

> /* This function creates new non-field accesses for
>    each new type in STR.  */
> static void
> create_new_stmts_for_general_acc (struct access_site * acc, d_str str)
> {

Missing documentation for ACC.

> /* This function creates new condition stmt and
>    redirect edges to correspondent BBs.
>    NEW_VAR position in stmt is POS.  */
> static void
> create_new_stmts_for_cond_expr_1 (tree new_var, tree cond_stmt, bool pos)
> {

Missing documentation for POS and COND_STMT.

s/correspondent/corresponding/

>   new_cond = copy_node (COND_EXPR_COND (cond_stmt));
>

Better call unshare_expr() here.  Similarly in other calls to
copy_node that copy a statement.

>   new_stmt = build3 (COND_EXPR,TREE_TYPE (cond_stmt),
> 		     new_cond, NULL_TREE, NULL_TREE);

Space after first comma.

>   if (!list_p)
>     return NULL;
>   else
>     {
>       node = list_p;
>       while (node)
> 	{
> 	  if (node->orig_var == decl)
> 	    return node;
> 	  node = node->next;
> 	}
>     }
>   return NULL;

As before.  Wouldn't it be better to implement this on a pointer-map
or hash table?

> /* Stage 3.  */
> static void
> do_reorg (void)

Stage 3?  What does that mean?

>   /* Do reorg for each funciton separately.  */

s/funciton/function/

Several functions have documentation missing for arguments.  Also,
there should be a blank line between function comments and the start
of the function.

All in all, the patch seems fine.  The major things I noticed were
the, perhaps unnecessary, use of manually-maintained single-linked
lists and the O(N) searching.  Those could be solved with VECs and
pointer-maps/hashtables.