[PATCH 1/2] Add an no_reorder attribute for LTO

Jan Hubicka hubicka@ucw.cz
Sun Sep 14 19:20:00 GMT 2014


> From: Andi Kleen <ak@linux.intel.com>
> 
> Some projects need to prevent reordering of specific top level
> declarations with LTO, in particular declarations defining init calls.

Thanks for working on it!
> 
> The only way to do that with LTO was to use -fno-toplevel-reorder,
> which stops reordering for all declarations and makes LTO partitioning
> less efficient.

Yep, -fno-toplevel-reorder also disables some optimizations (as unreachable
function removal)
> 
> This patch adds a new no_reorder attribute that stops reordering only
> for the marked declaration. The program can then only mark e.g. the
> initcalls and leave all the other declarations alone.
> 
> The patch does:
> 
> - Adds the new no_reorder attribute for the C family.
> - Initializes a new no_reorder flag in the symtab_nodes in the
> function visibility flag.
> - Maintains the no_reorder flag when creating new nodes.
> - Changes the partition code to always keep a separate
> sorted queue of ordered nodes and flush them in order with the other
> nodes. This is used by all nodes with -fno-toplevel-reorder,
> and only the marked ones without it.
> Parts of the old -fno-toplevel-reorder code paths are reused.
> - Adds various checks throughout the tree to make no_reorder
> marked functions behave the same as with -fno-toplevel-reorder
> - Changes the LTO streamer to serialize the no_reorder attribute.
> 
> Bootstrapped and tested with LTO + -fno-toplevel-reorder, plain LTO
> and bootstrap w/o LTO on x86_64-linux. Also fixes the reordering in
> the other large project.
> 
> gcc/c-family/:
> 
> 2014-09-14  Andi Kleen  <ak@linux.intel.com>
> 
> 	* c-common.c (handle_no_reorder_attribute): New function.
> 	(c_common_attribute_table): Add no_reorder attribute.
> 
> gcc/:
> 
> 2014-09-14  Andi Kleen  <ak@linux.intel.com>
> 
> 	* cgraph.h (symtab_node): Add no_reorder attribute.
> 	* cgraphclones.c (cgraph_node::create_clone): Copy no_reorder.
> 	(cgraph_node::create_version_clone): Dito.
> 	* trans-mem.c (ipa_tm_create_version_alias): Dito.
> 	* cgraphunit.c (varpool_node::finalize_decl): Check no_reorder.
> 	(output_in_order): Add no_reorder flag. Only handle no_reorder
> 	nodes when set.
> 	(symbol_table::compile): Add separate pass for no_reorder nodes.
> 	* doc/extend.texi (no_reorder): Document no_reorder attribute.
> 	* ipa-visibility.c (function_and_variable_visibility): Set
> 	no_reorder flag in symtab_node from declaration.
> 	* lto-cgraph.c (lto_output_node): Serialize no_reorder.
> 	(lto_output_varpool_node): Dito.
> 	(input_overwrite_node): Dito.
> 	(input_varpool_node): Dito.
> 	* varpool.c (varpool_node::add): Set no_reorder attribute.
> 	(symbol_table::remove_unreferenced_decls): Handle no_reorder.
> 	(symbol_table::output_variables): Dito.
> 	* symtab.c (symtab_node::dump_base): Print no_reorder.
> 
> gcc/lto/:
> 
> 2014-09-13  Andi Kleen  <ak@linux.intel.com>
> 
> 	* lto-partition.c (node_cmp): Update comment.
> 	(varpool_node_cmp): Use symtab_node for comparison.
> 	(add_sorted_nodes): New function.
> 	(lto_balanced_map): Change to keep ordered queue
> 	of ordered node. Handle no_reorder attribute.
> ---
>  gcc/c-family/c-common.c |  17 ++++++++
>  gcc/cgraph.h            |   2 +
>  gcc/cgraphclones.c      |   2 +
>  gcc/cgraphunit.c        |  36 ++++++++++------
>  gcc/doc/extend.texi     |   9 +++-
>  gcc/ipa-visibility.c    |   6 +++
>  gcc/lto-cgraph.c        |   4 ++
>  gcc/lto/lto-partition.c | 112 ++++++++++++++++++++++++++++++------------------
>  gcc/symtab.c            |   2 +
>  gcc/trans-mem.c         |   1 +
>  gcc/varpool.c           |  21 +++++++--
>  11 files changed, 153 insertions(+), 59 deletions(-)
> 
> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 39be956..74bda6b 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -328,6 +328,8 @@ static tree handle_used_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_unused_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_externally_visible_attribute (tree *, tree, tree, int,
>  						 bool *);
> +static tree handle_no_reorder_attribute (tree *, tree, tree, int,
> +						 bool *);
>  static tree handle_const_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_transparent_union_attribute (tree *, tree, tree,
>  						int, bool *);
> @@ -652,6 +654,8 @@ const struct attribute_spec c_common_attribute_table[] =
>  			      handle_unused_attribute, false },
>    { "externally_visible",     0, 0, true,  false, false,
>  			      handle_externally_visible_attribute, false },
> +  { "no_reorder",	      0, 0, true, false, false,
> +                              handle_no_reorder_attribute, false },
>    /* The same comments as for noreturn attributes apply to const ones.  */
>    { "const",                  0, 0, true,  false, false,
>  			      handle_const_attribute, false },
> @@ -6953,6 +6957,19 @@ handle_externally_visible_attribute (tree *pnode, tree name,
>    return NULL_TREE;
>  }
>  
> +/* Handle the "no_reorder" attribute. Arguments as in
> +   struct attribute_spec.handler. */
> +
> +static tree
> +handle_no_reorder_attribute (tree *,
> +			     tree,
> +			     tree,
> +			     int,
> +			     bool *)
> +{
> +  return NULL_TREE;
> +}

I think you want to check here that it is not automatic variable.
I.e. TREE_STATIC (perhaps || DECL_EXTERNAL) and output error otherwise.

I wonder if we want to support this on DECL_ONE_ONLY.

>    /* Set when function is visible by other units.  */
>    unsigned externally_visible : 1;
> +  /* Don't reorder to other symbols having this set.  */
> +  unsigned no_reorder : 1;

Is it necessary to introduce extra bit for this? It is quite rarely chcecked
property, what it buys over lookup_attribute calls everywhere?

>    /* The symbol will be assumed to be used in an invisible way (like
>       by an toplevel asm statement).  */
>    unsigned force_output : 1;
> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
> index 224bb55..38d92f5 100644
> --- a/gcc/cgraphclones.c
> +++ b/gcc/cgraphclones.c
> @@ -437,6 +437,7 @@ cgraph_node::create_clone (tree decl, gcov_type gcov_count, int freq,
>    new_node->definition = definition;
>    new_node->local = local;
>    new_node->externally_visible = false;
> +  new_node->no_reorder = no_reorder;

In kernel case I suppose you don't need to stick no-reorder flag on clones?
If we produce static clone, in what case keeping noreorderness help?
> -      || (!flag_toplevel_reorder && !DECL_COMDAT (node->decl)
> -	  && !DECL_ARTIFICIAL (node->decl)))
> +      || node->no_reorder
> +      || ((!flag_toplevel_reorder
> +          && !DECL_COMDAT (node->decl)
> +	   && !DECL_ARTIFICIAL (node->decl))))

This makes no-reorder variables unremovable by unreachable function/variable removal.
This is not documented so in extend.texi and moreover perhaps we do not want this,
as user can use no_reorder && used attribute in that case?
> @@ -2203,13 +2212,14 @@ symbol_table::compile (void)
>    state = EXPANSION;
>  
>    if (!flag_toplevel_reorder)
> -    output_in_order ();
> +    output_in_order (false);
>    else
>      {
>        output_asm_statements ();
>  
>        expand_all_functions ();
>        output_variables ();
> +      output_in_order (true);

I would expect output_in_order to come first and perhaps replace output_asm_statements
(so relative order of asm statements and output in order stuff is preserved)
> @@ -492,6 +492,9 @@ function_and_variable_visibility (bool whole_program)
>  	  node->externally_visible = false;
>  	  node->forced_by_abi = false;
>  	}
> +      if (lookup_attribute ("no_reorder",
> +			    DECL_ATTRIBUTES (node->decl)))
> +	node->no_reorder = 1;
>        if (!node->externally_visible
>  	  && node->definition && !node->weakref
>  	  && !DECL_EXTERNAL (node->decl))
> @@ -633,6 +636,9 @@ function_and_variable_visibility (bool whole_program)
>            vnode->externally_visible = false;
>  	  vnode->forced_by_abi = false;
>  	}
> +      if (lookup_attribute ("no_reorder",
> +			    DECL_ATTRIBUTES (vnode->decl)))
> +	vnode->no_reorder = 1;

If we end up with flag - I think I would preffer just explicit lookups to avoid
need to stream and maintain the duplicated info, I think this should go into
process_function_and_variable_attributes in cgraphunit.c

We could hide implementation detail into no_reorder () method and setter.

> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 5bd089b..0451a66 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -333,7 +333,8 @@ lto_max_map (void)
>      new_partition ("empty");
>  }
>  
> -/* Helper function for qsort; sort nodes by order.  */
> +/* Helper function for qsort; sort nodes by order. noreorder functions must have
> +   been removed earlier.  */
>  static int
>  node_cmp (const void *pa, const void *pb)
>  {
> @@ -365,11 +366,26 @@ node_cmp (const void *pa, const void *pb)
>  static int
>  varpool_node_cmp (const void *pa, const void *pb)
>  {
> -  const varpool_node *a = *(const varpool_node * const *) pa;
> -  const varpool_node *b = *(const varpool_node * const *) pb;
> +  const symtab_node *a = *static_cast<const symtab_node * const *> (pa);
> +  const symtab_node *b = *static_cast<const symtab_node * const *> (pb);
>    return b->order - a->order;
>  }
>  
> +/* Add all symtab nodes from NEXT_NODE to PARTITION in order.  */
> +
> +static void
> +add_sorted_nodes (vec<symtab_node *> &next_nodes, ltrans_partition partition)
> +{
> +  unsigned i;
> +  symtab_node *node;
> +
> +  next_nodes.qsort (varpool_node_cmp);
> +  FOR_EACH_VEC_ELT (next_nodes, i, node)
> +    if (!symbol_partitioned_p (node))
> +      add_symbol_to_partition (partition, node);
> +}
> +
> +
>  /* Group cgraph nodes into equally-sized partitions.
>  
>     The partitioning algorithm is simple: nodes are taken in predefined order.
> @@ -414,7 +430,8 @@ lto_balanced_map (int n_lto_partitions)
>    int n_nodes = 0;
>    int n_varpool_nodes = 0, varpool_pos = 0, best_varpool_pos = 0;
>    struct cgraph_node **order = XNEWVEC (cgraph_node *, symtab->cgraph_max_uid);
> -  varpool_node **varpool_order = NULL;
> +  auto_vec<cgraph_node *> noreorder;
> +  auto_vec<varpool_node *> varpool_order;
>    int i;
>    struct cgraph_node *node;
>    int total_size = 0, best_total_size = 0;
> @@ -427,6 +444,7 @@ lto_balanced_map (int n_lto_partitions)
>      INT_MAX, best_internal = 0;
>    int npartitions;
>    int current_order = -1;
> +  int noreorder_pos = 0;
>  
>    FOR_EACH_VARIABLE (vnode)
>      gcc_assert (!vnode->aux);

Hmm, why this is not a simple pass over the nodes array that goes first and inserts all noreorder
symbols into the first partition before the actual balancing starts?

The rest of patch seems sane, lets clean up these few details. Thanks!

Honza

> @@ -434,7 +452,10 @@ lto_balanced_map (int n_lto_partitions)
>    FOR_EACH_DEFINED_FUNCTION (node)
>      if (node->get_partitioning_class () == SYMBOL_PARTITION)
>        {
> -	order[n_nodes++] = node;
> +	if (node->no_reorder)
> +	  noreorder.safe_push (node);
> +	else
> +	  order[n_nodes++] = node;
>  	if (!node->alias)
>  	  total_size += inline_summary (node)->size;
>        }
> @@ -445,27 +466,26 @@ lto_balanced_map (int n_lto_partitions)
>       get better about minimizing the function bounday, but until that
>       things works smoother if we order in source order.  */
>    qsort (order, n_nodes, sizeof (struct cgraph_node *), node_cmp);
> +  noreorder.qsort (node_cmp);
>  
>    if (symtab->dump_file)
> -    for(i = 0; i < n_nodes; i++)
> -      fprintf (symtab->dump_file, "Balanced map symbol order:%s:%u\n",
> -	       order[i]->name (), order[i]->tp_first_run);
> -
> -  if (!flag_toplevel_reorder)
>      {
> -      FOR_EACH_VARIABLE (vnode)
> -	if (vnode->get_partitioning_class () == SYMBOL_PARTITION)
> -	  n_varpool_nodes++;
> -      varpool_order = XNEWVEC (varpool_node *, n_varpool_nodes);
> -
> -      n_varpool_nodes = 0;
> -      FOR_EACH_VARIABLE (vnode)
> -	if (vnode->get_partitioning_class () == SYMBOL_PARTITION)
> -	  varpool_order[n_varpool_nodes++] = vnode;
> -      qsort (varpool_order, n_varpool_nodes, sizeof (varpool_node *),
> -	     varpool_node_cmp);
> +      for(i = 0; i < n_nodes; i++)
> +	fprintf (symtab->dump_file, "Balanced map symbol order:%s:%u\n",
> +		 order[i]->name (), order[i]->tp_first_run);
> +      for(i = 0; i < (int)noreorder.length(); i++)
> +	fprintf (symtab->dump_file, "Balanced map symbol no_reorder:%s:%u\n",
> +		 noreorder[i]->name (), noreorder[i]->tp_first_run);
>      }
>  
> +  /* Collect all variables that should not be reordered.  */
> +  FOR_EACH_VARIABLE (vnode)
> +    if (vnode->get_partitioning_class () == SYMBOL_PARTITION
> +	&& (!flag_toplevel_reorder || vnode->no_reorder))
> +      varpool_order.safe_push (vnode);
> +  n_varpool_nodes = varpool_order.length ();
> +  varpool_order.qsort (varpool_node_cmp);
> +
>    /* Compute partition size and create the first partition.  */
>    partition_size = total_size / n_lto_partitions;
>    if (partition_size < PARAM_VALUE (MIN_PARTITION_SIZE))
> @@ -476,6 +496,8 @@ lto_balanced_map (int n_lto_partitions)
>      fprintf (symtab->dump_file, "Total unit size: %i, partition size: %i\n",
>  	     total_size, partition_size);
>  
> +  auto_vec<symtab_node *> next_nodes;
> +
>    for (i = 0; i < n_nodes; i++)
>      {
>        if (symbol_partitioned_p (order[i]))
> @@ -483,14 +505,19 @@ lto_balanced_map (int n_lto_partitions)
>  
>        current_order = order[i]->order;
>  
> -      if (!flag_toplevel_reorder)
> -	while (varpool_pos < n_varpool_nodes
> -	       && varpool_order[varpool_pos]->order < current_order)
> -	  {
> -	    if (!symbol_partitioned_p (varpool_order[varpool_pos]))
> -	      add_symbol_to_partition (partition, varpool_order[varpool_pos]);
> -	    varpool_pos++;
> -	  }
> +      /* Output noreorder and varpool in program order first.  */
> +      next_nodes.truncate (0);
> +      while (varpool_pos < n_varpool_nodes
> +	     && varpool_order[varpool_pos]->order < current_order)
> +	next_nodes.safe_push (varpool_order[varpool_pos++]);
> +      while (noreorder_pos < (int)noreorder.length ()
> +	     && noreorder[noreorder_pos]->order < current_order)
> +	{
> +	  if (!noreorder[noreorder_pos]->alias)
> +	    total_size -= inline_summary (noreorder[noreorder_pos])->size;
> +	  next_nodes.safe_push (noreorder[noreorder_pos++]);
> +	}
> +      add_sorted_nodes (next_nodes, partition);
>  
>        add_symbol_to_partition (partition, order[i]);
>        if (!order[i]->alias)
> @@ -580,6 +607,7 @@ lto_balanced_map (int n_lto_partitions)
>  		if (!vnode->definition)
>  		  continue;
>  		if (!symbol_partitioned_p (vnode) && flag_toplevel_reorder
> +		    && !vnode->no_reorder
>  		    && vnode->get_partitioning_class () == SYMBOL_PARTITION)
>  		  add_symbol_to_partition (partition, vnode);
>  		index = lto_symtab_encoder_lookup (partition->encoder,
> @@ -616,6 +644,7 @@ lto_balanced_map (int n_lto_partitions)
>  		   to be removed.  Coupling with objects they refer to only helps to reduce
>  		   number of symbols promoted to hidden.  */
>  		if (!symbol_partitioned_p (vnode) && flag_toplevel_reorder
> +		    && !vnode->no_reorder
>  		    && !vnode->can_remove_if_no_refs_p ()
>  		    && vnode->get_partitioning_class () == SYMBOL_PARTITION)
>  		  add_symbol_to_partition (partition, vnode);
> @@ -705,24 +734,25 @@ lto_balanced_map (int n_lto_partitions)
>  	}
>      }
>  
> +  next_nodes.truncate (0);
> +
>    /* Varables that are not reachable from the code go into last partition.  */
>    if (flag_toplevel_reorder)
>      {
>        FOR_EACH_VARIABLE (vnode)
>  	if (vnode->get_partitioning_class () == SYMBOL_PARTITION
> -	    && !symbol_partitioned_p (vnode))
> -	  add_symbol_to_partition (partition, vnode);
> -    }
> -  else
> -    {
> -      while (varpool_pos < n_varpool_nodes)
> -	{
> -	  if (!symbol_partitioned_p (varpool_order[varpool_pos]))
> -	    add_symbol_to_partition (partition, varpool_order[varpool_pos]);
> -	  varpool_pos++;
> -	}
> -      free (varpool_order);
> +	    && !symbol_partitioned_p (vnode)
> +	    && !vnode->no_reorder)
> +	  next_nodes.safe_push (vnode);
>      }
> +
> +  /* Output remaining ordered symbols.  */
> +  while (varpool_pos < n_varpool_nodes)
> +    next_nodes.safe_push (varpool_order[varpool_pos++]);
> +  while (noreorder_pos < (int)noreorder.length ())
> +    next_nodes.safe_push (noreorder[noreorder_pos++]);
> +  add_sorted_nodes (next_nodes, partition);
> +
>    free (order);
>  }
>  
> diff --git a/gcc/symtab.c b/gcc/symtab.c
> index 792b3b5..ff1bdf7 100644
> --- a/gcc/symtab.c
> +++ b/gcc/symtab.c
> @@ -831,6 +831,8 @@ symtab_node::dump_base (FILE *f)
>      fprintf (f, " forced_by_abi");
>    if (externally_visible)
>      fprintf (f, " externally_visible");
> +  if (no_reorder)
> +    fprintf (f, " no_reorder");
>    if (resolution != LDPR_UNKNOWN)
>      fprintf (f, " %s",
>   	     ld_plugin_symbol_resolution_names[(int)resolution]);
> diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
> index 8285796..94d896d 100644
> --- a/gcc/trans-mem.c
> +++ b/gcc/trans-mem.c
> @@ -4849,6 +4849,7 @@ ipa_tm_create_version_alias (struct cgraph_node *node, void *data)
>    new_node = cgraph_node::create_same_body_alias (new_decl, info->new_decl);
>    new_node->tm_clone = true;
>    new_node->externally_visible = info->old_node->externally_visible;
> +  new_node->no_reorder = info->old_node->no_reorder;
>    /* ?? Do not traverse aliases here.  */
>    get_cg_data (&node, false)->clone = new_node;
>  
> diff --git a/gcc/varpool.c b/gcc/varpool.c
> index 14ef089..8001c93 100644
> --- a/gcc/varpool.c
> +++ b/gcc/varpool.c
> @@ -449,6 +449,8 @@ varpool_node::add (tree decl)
>    symtab->call_varpool_insertion_hooks (node);
>    if (node->externally_visible_p ())
>      node->externally_visible = true;
> +  if (lookup_attribute ("no_reorder", decl))
> +    node->no_reorder = 1;
>  }
>  
>  /* Return variable availability.  See cgraph.h for description of individual
> @@ -640,7 +642,7 @@ symbol_table::remove_unreferenced_decls (void)
>    for (node = first_defined_variable (); node; node = next)
>      {
>        next = next_defined_variable (node);
> -      if (!node->aux)
> +      if (!node->aux && !node->no_reorder)
>  	{
>  	  if (dump_file)
>  	    fprintf (dump_file, " %s", node->asm_name ());
> @@ -687,11 +689,22 @@ symbol_table::output_variables (void)
>    timevar_push (TV_VAROUT);
>  
>    FOR_EACH_DEFINED_VARIABLE (node)
> -    node->finalize_named_section_flags ();
> +    {
> +      /* Handled in output_in_order.  */
> +      if (node->no_reorder)
> +	continue;
> +
> +      node->finalize_named_section_flags ();
> +    }
>  
>    FOR_EACH_DEFINED_VARIABLE (node)
> -    if (node->assemble_decl ())
> -      changed = true;
> +    {
> +      /* Handled in output_in_order.  */
> +      if (node->no_reorder)
> +	continue;
> +      if (node->assemble_decl ())
> +        changed = true;
> +    }
>    timevar_pop (TV_VAROUT);
>    return changed;
>  }
> -- 
> 2.1.0



More information about the Gcc-patches mailing list