This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH] New IPA-CP with real function cloning

From: Jan Hubicka <hubicka at ucw dot cz>
To: GCC Patches <gcc-patches at gcc dot gnu dot org>, Jan Hubicka <hubicka at ucw dot cz>, mjambor at suse dot cz
Date: Thu, 7 Jul 2011 18:03:07 +0200
Subject: Re: [PATCH] New IPA-CP with real function cloning
References: <20110623192404.GC2736@virgil.arch.suse.de>
Hi,
patch is long, so let me review it in more passes.
> 
> 
> 2011-06-22  Martin Jambor  <mjambor@suse.cz>
> 
> 	* ipa-prop.h: Include alloc-pool.h.
> 	(ipa_lattice_type): Removed.
> 	(ipcp_value_source): New type.
> 	(ipcp_value): Likewise.
> 	(ipcp_values_pool): Declare.
> 	(ipcp_sources_pool): Likewise.
> 	(ipa_param_descriptor): Removed.
> 	(ipcp_lattice): Removed fileds type and constant. Added fields decl,
> 	values, values_count, contains_variable, bottom, used and virt_call.
> 	(ipa_node_params): New fields lattices, known_vals,
> 	clone_for_all_contexts and noe dead, removed fields params and
> 	count_scale.
> 	(ipa_get_param): Updated.
> 	(ipa_param_cannot_devirtualize_p): Removed.
> 	(ipa_param_types_vec_empty): Likewise.
> 	(ipa_edge_args): New field next_edge_clone.
> 	(ipa_func_list): Removed.
> 	(ipa_init_func_list): Removed declaration.
> 	(ipa_push_func_to_list_1): Likewise.
> 	(ipa_pop_func_from_list): Likewise.
> 	(ipa_push_func_to_list): Removed.
> 	(ipa_lattice_from_jfunc): Remove declaration.
> 	(ipa_get_jf_pass_through_result): Declare.
> 	(ipa_get_jf_ancestor_result): Likewise.
> 	(ipa_value_from_jfunc): Likewise.
> 	(ipa_get_lattice): Update.
> 	(ipa_lat_is_single_const): New function.
> 	* ipa-prop.c (ipa_push_func_to_list_1): Removed.
> 	(ipa_init_func_list): Likewise.
> 	(ipa_pop_func_from_list): Likewise.
> 	(ipa_get_param_decl_index): Fix coding style.
> 	(ipa_populate_param_decls): Update to use new lattices.
> 	(ipa_initialize_node_params): Likewise.
> 	(visit_ref_for_mod_analysis): Likewise.
> 	(ipa_analyze_params_uses): Likewise.
> 	(ipa_free_node_params_substructures): Likewise.
> 	(ipa_edge_duplication_hook): Add the new edge to the list of edge
> 	clones.
> 	(ipa_node_duplication_hook): Update to use new lattices.
> 	(ipa_free_all_structures_after_ipa_cp): Free alloc pools.
> 	(ipa_free_all_structures_after_iinln): Likewise.
> 	(ipa_write_node_info): Update to use new lattices.
> 	(ipa_read_node_info): Likewise.
> 	(ipa_get_jf_pass_through_result): New function.
> 	(ipa_get_jf_ancestor_result): Likewise.
> 	(ipa_value_from_jfunc): Likewise.
> 	(ipa_cst_from_jfunc): Reimplemented using ipa_value_from_jfunc.
> 	* ipa-cp.c: Reimplemented.
> 	* params.def (PARAM_DEVIRT_TYPE_LIST_SIZE): Removed.
> 	(PARAM_IPA_CP_VALUE_LIST_SIZE): New parameter.
> 	* Makefile.in (IPA_PROP_H): Added alloc-pool.h to dependencies.
> 
> 	* doc/invoke.texi (devirt-type-list-size): Removed description.
> 	(ipa-cp-value-list-size): Added description.
> 
> 	* testsuite/gcc.dg/ipa/ipa-1.c: Updated testcase dump scan.
> 	* testsuite/gcc.dg/ipa/ipa-2.c: Likewise.
> 	* testsuite/gcc.dg/ipa/ipa-3.c: Likewise and made functions static.
> 	* testsuite/gcc.dg/ipa/ipa-4.c: Updated testcase dump scan.
> 	* testsuite/gcc.dg/ipa/ipa-5.c: Likewise.
> 	* testsuite/gcc.dg/ipa/ipa-7.c: Xfail test.
> 	* testsuite/gcc.dg/ipa/ipa-8.c: Updated testcase dump scan.
> 	* testsuite/gcc.dg/ipa/ipacost-1.c: Likewise.
> 	* testsuite/gcc.dg/ipa/ipacost-2.c: Likewise.
> 	* testsuite/gcc.dg/ipa/ipcp-1.c: New test.
> 	* testsuite/gcc.dg/ipa/ipcp-2.c: Likewise.
> 	* testsuite/gcc.dg/tree-ssa/ipa-cp-1.c: Updated testcase.

> /* Interprocedural analyses.
>    Copyright (C) 2005, 2007, 2008, 2009, 2010
2011
>    Free Software Foundation, Inc.
> 
> 
> /* The following definitions and interfaces are used by
>    interprocedural analyses or parameters.  */
> 
> /* ipa-prop.c stuff (ipa-cp, indirect inlining):  */

I was bit thinking about it and probably we could make ipa-prop
and ipa-inline-analysis to be stand alone analysis passes, instead of
something called either from inliner or ipa-cp analysis stage. But
that could be done incrementally.

> 
> /* A jump function for a callsite represents the values passed as actual
>    arguments of the callsite. There are three main types of values :
> 
>    Pass-through - the caller's formal parameter is passed as an actual
>                   argument, possibly one simple operation performed on it.
>    Constant     - a constant (is_gimple_ip_invariant)is passed as an actual
>                   argument.
>    Unknown      - neither of the above.
> 
>    IPA_JF_CONST_MEMBER_PTR stands for C++ member pointers, it is a special
>    constant in this regard.  Other constants are represented with IPA_JF_CONST.

While we are at docs, I would bit expand. It seems to me that for someone not familiar
with the concept is not clear at all why member pointers are special.
(i.e. explain that they are non-gimple-regs etc.)

> 
>    IPA_JF_ANCESTOR is a special pass-through jump function, which means that
>    the result is an address of a part of the object pointed to by the formal
>    parameter to which the function refers.  It is mainly intended to represent
>    getting addresses of of ancestor fields in C++
>    (e.g. &this_1(D)->D.1766.D.1756).  Note that if the original pointer is
>    NULL, ancestor jump function must behave like a simple pass-through.
> 
>    Other pass-through functions can either simply pass on an unchanged formal
>    parameter or can apply one simple binary operation to it (such jump
>    functions are called polynomial).
> 
>    IPA_JF_KNOWN_TYPE is a special type of an "unknown" function that applies
>    only to pointer parameters.  It means that even though we cannot prove that
>    the passed value is an interprocedural constant, we still know the exact
>    type of the containing object which may be valuable for devirtualization.
> 
>    Jump functions are computed in ipa-prop.c by function
>    update_call_notes_after_inlining.  Some information can be lost and jump
>    functions degraded accordingly when inlining, see
>    update_call_notes_after_inlining in the same file.  */

I would add pointer to the original Callahan at al ipa-cp paper.
I think it is cited from ipa-cp, so just copy it from there.

> 
> /* Structure holding data required to describe a pass-through jump function.  */
> 
> struct GTY(()) ipa_pass_through_data
> {
>   /* If an operation is to be performed on the original parameter, this is the
>      second (constant) operand.  */
>   tree operand;
>   /* Number of the caller's formal parameter being passed.  */
>   int formal_id;

I probably should use this in ipa-inline-analsysi where I call it for some reason operand_num :)

> 
> struct ipcp_value;

I wonder if the jump functions used by several passes and ipcp internal types both has o go into
the same header?

>   /* Time benefit and size cost that specializing the function for this value
>      can bring about in it's callees (transitively).  */
>   int prop_time_benefit, prop_size_cost;
>   /* True if this valye is currently on the topo-sort stack.  */
value ;)
> /* Lattice describing potential values of a formal parameter of a function and
>    some of their other properties.  */
> 
> struct ipcp_lattice
> {
>   /* PARAM_DECL of this parameter.  */
>   tree decl;
Why we need param de cl here, when almost everything seems to be using those formal_ids?

Please add comment how the latice looks like, it is not quite ovious i.e. what is top etc.
> 
>   /* The list of known values and types in this lattice.  */
>   struct ipcp_value *values;
>   /* Number of known values and types in this lattice.  */
>   int values_count;
>   /* The lattice contains a variable component  (in addition to values).  */
>   bool contains_variable;
>   /* The value of the lattice is bottom (i.e. variable and unusable for any
>      propagation).  */
>   bool bottom;
>   /* The parameter is used.  */
>   bool used;
>   /* There is a virtual call based on this parameter.  */
>   bool virt_call;
> };
> 
> /* ipa_node_params stores information related to formal parameters of functions
>    and some other information for interprocedural passes that operate on
>    parameters (such as ipa-cp).  */
> struct ipa_node_params

Hmm, I see, this is why we need to mix the ipacp and ipa-prop together.  Probably it would be
easier if we had ipa_prop_node_params and ipacp_node-params
(and I call it summary in ipa-inline that I think is sor of common way to call this)
> {
>   /* Pointer to an array of structures describing individual formal
>      parameters.  */
>   struct ipcp_lattice *lattices;

Hmm, how we get here around the need to mark this GTY(). I.e are we sure that all the known_vals
must be referneced from elsewhere at ggc time?
I would also slowly switch those things to VECtors..

>   /* Only for versioned nodes this field would not be NULL,
>      it points to the node that IPA cp cloned from.  */
>   struct cgraph_node *ipcp_orig_node;
Why not use node->clone_of here?
> 
> /* ipa_node_params access functions.  Please use these to access fields that
>    are or will be shared among various passes.  */
> 
> /* Set the number of formal parameters. */
> 
> static inline void
> ipa_set_param_count (struct ipa_node_params *info, int count)
> {
>   info->param_count = count;
> }
> 
> /* Return the number of formal parameters. */
> 
> static inline int
> ipa_get_param_count (struct ipa_node_params *info)
> {
>   return info->param_count;
> }
I see this is the need for DECL pointer in lattice.  I think the ipcp latice is separate thing
from the parameter summary (that is also used by ipa-inline-analysis and friends) so I would
make this separate VECtor with ipa_set_param_count growing the vector to proper size
and get_param_count using VEC_length
> 
> /* Return the declaration of Ith formal parameter of the function corresponding
>    to INFO.  Note there is no setter function as this array is built just once
>    using ipa_initialize_node_params. */
> 
> static inline tree
> ipa_get_param (struct ipa_node_params *info, int i)
> {
>   gcc_assert (i >= 0 && i <= info->param_count);
>   return info->lattices[i].decl;
> }
> 
> /* Return the used flag corresponding to the Ith formal parameter of
>    the function associated with INFO.  */
> 
> static inline bool
> ipa_is_param_used (struct ipa_node_params *info, int i)
> {
>   gcc_assert (i >= 0 && i <= info->param_count);
>   return info->lattices[i].used;
> }
Similarly for the USED flag, should probably go into VECtor along with the DECL pointer.
> 
> /* ipa_edge_args stores information related to a callsite and particularly its
>    arguments.  It can be accessed by the IPA_EDGE_REF macro.  */
> typedef struct GTY(()) ipa_edge_args

probably edge_summary would be my preferred name.

> {
>   /* Next pointer in a linked list of clones of the same function.  */
>   struct cgraph_edge *next_edge_clone;

What this is needed for?
> 
>   /* Number of actual arguments in this callsite.  When set to 0,
>      this callsite's parameters would not be analyzed by the different
>      stages of IPA CP.  */
>   int argument_count;
>   /* Array of the callsite's jump function of each parameter.  */
>   struct ipa_jump_func GTY ((length ("%h.argument_count"))) *jump_functions;

Would be prettier as VECtor.

> } ipa_edge_args_t;
> 
> /* This function ensures the array of node param infos is big enough to
>    accommodate a structure for all nodes and reallocates it if not.  */
> 
> static inline void
> ipa_check_create_node_params (void)

this name never realy worked for me (i.e. what is it checking?).
What about ipa_alloc_node_params or ipa_alloc_resize_node_params?


I realize that most of changes I sugested above are to the existing ipa-prop
headers.  I would like to see them done, it depends what seems least painful
for you - i.e. do them as part of your patch, do them beforehand or
incrementally after the new ipa-cp is comitted to mainline.

In particular I would like to see the ipa-cp data (used at propagation stage)
separated from function summaries (i.e. jump functions, param counts and decl/used
info).

Things will be more moudlar that way and we save some memory, too.

> /* Interprocedural constant propagation
>    Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2011
>    Free Software Foundation, Inc.
>    Contributed by Razya Ladelsky <RAZYA@il.ibm.com>
Since it is complette rewrite, I would add you there ;)
> /* Interprocedural constant propagation (IPA-CP).
> 
>    The goal of this transformation is to
> 
>    1) discover functions which are always invoked with some arguments with the
>       same known constant values and modify the functions so that the
>       subsequent optimizations can take advantage of the knowledge, and
> 
>    2) create specialized versions of functions transformed in this way if
>       some parameters are known constants only in certain contexts but the
>       estimated tradeoff between speedup and cost size is deemed good.
> 
>    The algorithm also propagates types and attempts to perform type based
>    devirtualization.  Types are propagated much like constants.

I would add that the algorithm is not doing only the traditional ipa-cp propagation,
but also handles partial specialization and propagate known types for devirtualization.
>    First stage - intraprocedural analysis
>    =======================================
> 
>    This phase computes jump_function and modification flags.
> 
>    A jump function for a call-site represents the values passed as an actual
>    arguments of a given call-site. In principle, there are three types of
>    values:
> 
>    Pass through - the caller's formal parameter is passed as an actual
>                   argument, plus an operation on it can be performed.
>    Constant - a constant is passed as an actual argument.
>    Unknown - neither of the above.

Propbably should document the new fancy types of jump functions.
Also mention that return functions are not handled, yet.
> 
>    Second stage - interprocedural analysis
>    ========================================
> 
>    This stage is itself divided into two phases.  In the first, we propagate
>    known values over the call graph, in the second, we make cloning decisions.

I would also explicitely mention that this stage is done differently from the
classical ipa-cp paper.
> /* Return true iff the CS is an edge within a strongly connected component as
>    computed by ipa_reduced_postorder.  */
> 
> static inline bool
> edge_within_scc (struct cgraph_edge *cs)
> {
>   struct ipa_dfs_info *caller_dfs = (struct ipa_dfs_info *) cs->caller->aux;
>   struct ipa_dfs_info *callee_dfs;
>   struct cgraph_node *callee = cgraph_function_node (cs->callee, NULL);

since thunks are not transparent for you (and eventually we will want to have
jump functions for those; passthrough in all but first argument and jump
function representing the thunk adjustment), you want cgraph_function_or_thunk
node.
> 
> /* Print V which is extracted from a value in a lattice to F.  */
> 
> static void
> print_ipcp_constant_value (FILE * f, tree v)
> {
>   if (TREE_CODE (v) == TREE_BINFO)
>     {
>       fprintf (f, "BINFO ");
>       print_generic_expr (f, BINFO_TYPE (v), 0);

I wish we had some resonable way to visualise the BINFOs...
It was always completely unclear what really is in them. But this is independent problem.
> 
> /* Determine whether it is at all technically possible to create clones of NODE
>    and store this information in the ipa_node_params structure associated
>    with NODE.  */
> 
> static void
> determine_versionability (struct cgraph_node *node)
> {
>   struct cgraph_edge *edge;
>   bool res = true;
> 
>   /* There are a number of generic reasons functions cannot be versioned.  We
>      also cannot remove parameters if there are type attributes such as fnspec
>      present.  */
>   if (node->alias || node->thunk.thunk_p

We can version alias if its destination is versionable (i.e. get function_or_thunk node)
also thunks should be versionable but they are not at the moment, inline summary will
however have versionable flag cleared.

>       || !inline_summary (node)->versionable
>       || TYPE_ATTRIBUTES (TREE_TYPE (node->decl))

This is getting an issue for Fortran that attach the function arguments quite commonly,
perhaps we should start moving ahead on this and ruling out only the arguments that
gets can't be preserved.
Also you need to test attributes of DECL itself.

I think this all should be part of can_change_signature_p code, since we can version
with all attributes I can thunk of when we don't change signature.

>       || cgraph_function_body_availability (node) <= AVAIL_OVERWRITABLE)
>     res =  false;
>   else
>     /* Removing arguments doesn't work if the function takes varargs
>        or use __builtin_apply_args.
>        FIXME: handle this together with can_change_signature flag.  */
>     for (edge = node->callees; edge; edge = edge->next_callee)
>       {
> 	tree t = edge->callee->decl;
> 	if (DECL_BUILT_IN_CLASS (t) == BUILT_IN_NORMAL
> 	    && (DECL_FUNCTION_CODE (t) == BUILT_IN_APPLY_ARGS
> 		|| DECL_FUNCTION_CODE (t) == BUILT_IN_VA_START))

can_change_sigunature will also handle apply_args.
Add VA_START code there, too.  For the other use of this flag (in i386) VA_START
rules out the change, too, but so far we explicitely tested that in the backend.

Iguess with these changes, inline_summary (node)->versionable should coincide
with IPA_NODE_REF (node)->node_versionable making this whole code unnecesary
(it was supposed to work this way, I am not sure why we now have to versionable
flags).
> 
> /* Worker callback of cgraph_for_node_and_aliases accumulating statistics of
>    non-thunk incoming edges to NODE.  */
> 
> static bool
> gather_caller_stats (struct cgraph_node *node, void *data)
> {
>   struct caller_statistics *stats = (struct caller_statistics *) data;
>   struct cgraph_edge *cs;
> 
>   for (cs = node->callers; cs; cs = cs->next_caller)
>     if (cs->caller->thunk.thunk_p)
>       gather_caller_stats (cs->caller, stats);
>     else
>       {
> 	stats->count_sum += cs->count;
> 	stats->freq_sum += cs->frequency;
> 	stats->n_calls++;
> 	if (cgraph_maybe_hot_edge_p (cs))
> 	  stats->n_hot_calls ++;
>       }
>   return false;
Why you compute count_sum instead of using node->count?  Is it because you are
interested in local calls only?
>   if (!optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))

When optimizing for size, we could still consider clonning the function if amount
of code saved at call sites is bigger than the function body..
Not really important however.
>     {
>       if (dump_file)
>         fprintf (dump_file, "Not considering %s for cloning; "
> 		 "optimizing it for size.\n",
>  	         cgraph_node_name (node));
>       return false;
>     }
> 
>   init_caller_stats (&stats);
>   cgraph_for_node_and_aliases (node, gather_caller_stats, &stats, false);
If you use for_node_thunk_and_aliases you can remove the recursion you do above
and it will work for aliases of thunk correctly, too ;)
> /* Allocate the arrays in TOPO and topologically sort the nodes into order.  */
> 
> static void
> build_topo_info (struct topo_info *topo)
toporder is probably more clear about what it means ;)
> /* Initialize ipcp_lattices.  */
> 
> static void
> initialize_node_lattices (struct cgraph_node *node)
> {
>   struct ipa_node_params *info = IPA_NODE_REF (node);
>   struct cgraph_edge *ie;
>   bool disable = false, variable = false;
>   int i;
> 
>   /* FIXME: Can we clobber only the first argument of thunks?  */
>   if (node->alias || node->thunk.thunk_p

I think sane thing to do here is to not have latices for aliases, since you skip
them during propagatoin, but have latices for thinks, since they should have their
own jump functions.

You can probably add sanity check into lattice accesor function that is is never
given an alias and skip the code disabling ipa-cp bellow leaving the data
uninitialized.

>       || ipa_is_called_with_var_arguments (info))

I always wondered if we need this, given that we check for use of stdarg.  Sure
writing vararg function without actually using the arguments is sort of stupid,
but it seems that the code should just work.

(at least with one bounds check in propagation for case function is called with
too few argumetns)

Also naturally propagating into stdarg functions is possible.

OK, rest later.

Honza
Follow-Ups:
- Re: [PATCH] New IPA-CP with real function cloning
  - From: Martin Jambor
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]