This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH 3/5] IPA ICF pass
- From: Jeff Law <law at redhat dot com>
- To: mliska <mliska at suse dot cz>, gcc-patches at gcc dot gnu dot org
- Cc: hubicka at ucw dot cz
- Date: Tue, 24 Jun 2014 14:31:29 -0600
- Subject: Re: [PATCH 3/5] IPA ICF pass
- Authentication-results: sourceware.org; auth=none
- References: <c5c2463c07186b4ba35b10f3063ecdd8f8d46d63 dot 1402913001 dot git dot mliska at suse dot cz> <ac1da49f0ee78643bc4521580862fa92e1051764 dot 1402913001 dot git dot mliska at suse dot cz>
On 06/13/14 04:44, mliska wrote:
Hello,
this is core of IPA ICF patchset. It adds new pass and registers all needed stuff related to a newly introduced interprocedural optimization.
Algorithm description:
In LGEN, we visit all read-only variables and functions. For each symbol, a hash value based on e.g. number of arguments,
number of BB, GIMPLE CODES is computed (similar hash is computed for read-only variables). This kind of information is streamed
for LTO.
In WPA, we build congruence classes for all symbols having a same hash value. For functions, these classes are subdivided in WPA by argument type comparison. Each reference (a call or a variable reference) to another semantic item candidate is marked and stored for further congruence class reduction (similar algorithm as Value Numbering: www.cs.ucr.edu/~gupta/teaching/553-07/Papers/value.pdf).
For every congruence class of functions with more than one semantic function, we load function body. Having this information, we can
process complete semantic function equality and subdivide such congruence class. Read-only variable class members are also deeply compared.
After that, we process Value numbering algorithm to do a final subdivision. Finally, all items belonging to a congruence class with more than one
item are merged.
Martin
Changelog:
2014-06-13 Martin Liska <mliska@suse.cz>
Jan Hubicka <hubicka@ucw.cz>
* Makefile.in: New pass object file added.
* common.opt: New -fipa-icf flag introduced.
* doc/invoke.texi: Documentation enhanced for the pass.
* lto-section-in.c: New LTO section for a summary created by IPA-ICF.
* lto-streamer.h: New section name introduced.
* opts.c: Optimization is added to -O2.
* passes.def: New pass added.
* timevar.def: New time var for IPA-ICF.
* tree-pass.h: Pass construction function.
* ipa-icf.h: New pass header file added.
* ipa-icf.c: New pass source file added.
You'll note many of my comments are "do you need to ...". You may in
fact be handling that stuff correctly, they're just things I'd like you
to verify are properly handled. If they're properly handled just say so :-)
At a high level, I think this needs to be broken down a bit more. We've
got two high level concepts in ipa-icf. One is all the equivalence
testing the other is using that information for the icf optimization.
Splitting out the equivalence testing seems like a good thing to do as
there's other contexts where it would be useful.
Overall I think you're on the right path and we just need to iterate a
bit on this part of the patchset.
@@ -7862,6 +7863,14 @@ it may significantly increase code size
(see @option{--param ipcp-unit-growth=@var{value}}).
This flag is enabled by default at @option{-O3}.
+@item -fipa-icf
+@opindex fipa-icf
+Perform Identical Code Folding for functions and read-only variables.
+Behavior is similar to Gold Linker ICF optimization. Symbols proved
+as semantically equivalent are redirected to corresponding symbol. The pass
+sensitively decides for usage of alias, thunk or local redirection.
+This flag is enabled by default at @option{-O2}.
So you've added this at -O2, what is the general compile-time impact?
Would it make more sense to instead have it be part of -O3, particularly
since ICF is rarely going to improve performance (sans icache issues).
+
+/* Interprocedural Identical Code Folding for functions and
+ read-only variables.
+
+ The goal of this transformation is to discover functions and read-only
+ variables which do have exactly the same semantics.
+
+ In case of functions,
+ we could either create a virtual clone or do a simple function wrapper
+ that will call equivalent function. If the function is just locally visible,
+ all function calls can be redirected. For read-only variables, we create
+ aliases if possible.
+
+ Optimization pass arranges as follows:
+ 1) All functions and read-only variables are visited and internal
+ data structure, either sem_function or sem_variables is created.
+ 2) For every symbol from the previoues step, VAR_DECL and FUNCTION_DECL are
+ saved and matched to corresponding sem_items.
s/previoues/previous/
+ 3) These declaration are ignored for equality check and are solved
+ by Value Numbering algorithm published by Alpert, Zadeck in 1992.
+ 4) We compute hash value for each symbol.
+ 5) Congruence classes are created based on hash value. If hash value are
+ equal, equals function is called and symbols are deeply compared.
+ We must prove that all SSA names, declarations and other items
+ correspond.
+ 6) Value Numbering is executed for these classes. At the end of the process
+ all symbol members in remaining classes can be mrged.
s/mrged/merged.
+ 7) Merge operation creates alias in case of read-only variables. For
+ callgraph node, we must decide if we can redirect local calls,
+ create an alias or a thunk.
Presumably that's the order in which we try to resolve identical
functions (first by redirection if it's local, then an alias, then a thunk)?
Is the code conditionalized so that it works properly on targets where
we can't create aliases? Do we have any such targets that can be easily
tested these days? Has this been tested on anything other than x86-64
linux? AIX immediately comes to mind as an interesting testing target.
+
+namespace {
+
+func_checker::func_checker (): initialized (false)
+{
+}
+
+/* Itializes internal structures according to given number of
s/Itializes/Initializes/
+ source and target SSA names. The number of source names is SSA_SOURCE,
+ respectively SSA_TARGET. */
+
+void
+func_checker::initialize (unsigned ssa_source, unsigned ssa_target)
+{
+ release ();
+ initialized = true;
+
+ source_ssa_names.create (ssa_source);
+ target_ssa_names.create (ssa_target);
+
+ for (unsigned int i = 0; i < ssa_source; i++)
+ source_ssa_names.safe_push (-1);
+
+ for (unsigned int i = 0; i < ssa_target; i++)
+ target_ssa_names.safe_push (-1);
+
+ edge_map = new pointer_map <edge> ();
+
+ decl_map = new pointer_map <tree> ();
+}
+
+/* Memory release routine. */
+
+void
+func_checker::release (void)
+{
+ if (!initialized)
+ return;
+
+ delete edge_map;
+ delete decl_map;
+ source_ssa_names.release();
+ target_ssa_names.release();
+}
So I'm a big fan of RAII style of writing code. While I primarily like
it for locking, I also find it's a good model for memory management.
Is there any reason these two functions aren't a suitable ctor/dtor?
+
+/* Verifies that trees T1 and T2 do correspond. */
+
+bool
+func_checker::compare_ssa_name (tree t1, tree t2)
+{
+ unsigned i1 = SSA_NAME_VERSION (t1);
+ unsigned i2 = SSA_NAME_VERSION (t2);
+
+ if (source_ssa_names[i1] == -1)
+ source_ssa_names[i1] = i2;
+ else if (source_ssa_names[i1] != (int) i2)
+ return false;
+
+ if(target_ssa_names[i2] == -1)
+ target_ssa_names[i2] = i1;
+ else if (target_ssa_names[i2] != (int) i1)
+ return false;
+
+ return true;
+}
Isn't this really checking for equivalence? "do correspond" seems
awkward here.
+
+/* Verification function for edges E1 and E2. */
+
+bool
+func_checker::compare_edge (edge e1, edge e2)
+{
+ if (e1->flags != e2->flags)
+ return false;
Presumably there's no flags we can safely ignore. So absolute equality
seems reasonable here.
+/* Congruence class constructor for a new class with _ID. */
+
+congruence_class::congruence_class (unsigned int _id): id(_id)
+{
+ members.create (2);
+}
Is there a dtor which releases this object?
+
+void
+sem_item::setup (bitmap_obstack *stack)
+{
+ gcc_checking_assert (node);
+
+ refs.create (0);
+ tree_refs.create (0);
+ usages.create (0);
+ tree_refs_set = pointer_set_create ();
+ usage_index_bitmap = BITMAP_ALLOC (stack);
+}
+
+sem_item::~sem_item ()
+{
+ if (tree_refs_set)
+ pointer_set_destroy (tree_refs_set);
+
+ for (unsigned i = 0; i < usages.length (); i++)
+ delete usages[i];
+}
Do you need to release refs, tree_refs or the bitmap?
+
+/* Compare two types if are same aliases in case of strict aliasing
+ is enabled. */
+bool
+sem_item::compare_for_aliasing (tree t1, tree t2)
+{
+ if (flag_strict_aliasing)
+ {
+ alias_set_type s1 = get_deref_alias_set (TREE_TYPE (t1));
+ alias_set_type s2 = get_deref_alias_set (TREE_TYPE (t2));
+
+ return s1 == s2;
+ }
+
+ return true;
+}
Is returning TRUE really the conservatively correct thing to do in the
absence of aliasing information? Isn't that case really "I don't know"
in which case the proper return value is FALSE?
+
+/* Semantic function constructor that uses STACK as bitmap memory stack. */
+
+sem_function::sem_function (bitmap_obstack *stack): sem_item (FUNC, stack),
+ compared_func (NULL)
+{
+ arg_types.create (0);
Does this need to be released?
+
+/* Gets symbol name of the item. */
+
+const char *
+sem_function::name (void)
+{
+ return node->name ();
+}
+
+/* Gets assembler name of the item. */
+
+const char *
+sem_function::asm_name (void)
+{
+ return node->asm_name ();
+}
Are these trivial enough that they should be inlined in the class
definition? As much as I dislike that style of programming, these seem
like classic candidates unless I'm missing something.
+/* References independent hash function. */
+
+hashval_t
+sem_function::get_hash (void)
+{
+ if(!hash)
+ {
+ hash = 177454; /* Random number for function type. */
+
+ hash = iterative_hash_object (arg_count, hash);
+ hash = iterative_hash_object (bb_count, hash);
+ hash = iterative_hash_object (edge_count, hash);
+ hash = iterative_hash_object (cfg_checksum, hash);
Does CFG_CHECKSUM encompass the bb/edge counts?
+}
+
+/* Fast equality function based on knowledge known in WPA. */
+
+bool
+sem_function::equals_wpa (sem_item *item)
+{
+ if (item->type != FUNC)
+ return false;
Can this ever happen? Either remove to turn into an assert.
+/* Processes function equality comparison. */
+
+bool
+sem_function::equals_private (sem_item *item)
+{
+ if (item->type != FUNC)
+ return false;
+
+ basic_block bb1, bb2;
+ edge e1, e2;
+ edge_iterator ei1, ei2;
+ int *bb_dict = NULL;
+ bool result = true;
+ tree arg1, arg2;
+
+ compared_func = static_cast<sem_function *> (item);
+
+ gcc_assert (decl != item->decl);
+
+ if (arg_count != compared_func->arg_count
+ || bb_count != compared_func->bb_count
+ || edge_count != compared_func->edge_count
+ || cfg_checksum != compared_func->cfg_checksum)
Again, check if testing the bb/edge counts aren't necessary as we check
the cfg checksum.
+ SE_EXIT_FALSE();
+
+ if (!equals_wpa (item))
+ return false;
+
+ /* Checking function arguments. */
+ tree decl1 = DECL_ATTRIBUTES (decl);
+ tree decl2 = DECL_ATTRIBUTES (compared_func->decl);
So are there any attributes we can safely ignore? Probably not.
However, we ought to handle the case where the attributes appear in
different orders.
Also makes me wonder does the comparison of arguments/return value
verify their attributes as well?
+
+/* Returns cgraph_node. */
+
+struct cgraph_node *
+sem_function::get_node (void)
+{
+ return cgraph (node);
+}
+
+/* Initialize semantic item by info reachable during LTO WPA phase. */
+
+void
+sem_function::init_wpa (void)
+{
+ parse_tree_args ();
+}
inline? Worth or not worth the headache?
+
+bool
+sem_function::compare_bb (sem_bb_t *bb1, sem_bb_t *bb2, tree func1, tree func2)
So this routine walks down the gimple statements and compares them for
equality. Would it make sense to have the equality testing in gimple?
That way if someone adds a new gimple code the places they need to
check/update are at least somewhat more localized?
+
+ for (i = 0; i < size1; ++i)
+ {
+ t1 = gimple_phi_arg (phi1, i)->def;
+ t2 = gimple_phi_arg (phi2, i)->def;
+
+ if (!compare_operand (t1, t2, func1, func2))
+ SE_EXIT_FALSE ();
+
+ e1 = gimple_phi_arg_edge (phi1, i);
+ e2 = gimple_phi_arg_edge (phi2, i);
+
+ if (!checker.compare_edge (e1, e2))
+ SE_EXIT_FALSE ();
+ }
I don't think we guarantee any particular order on the PHI args. ISTM
you'd want to sort them or something so as not to reject a possible
duplicate simply because of ordering issues.
Related, I'm not sure bb indexes are even guaranteed to have any stable
ordering. So ISTM you'd want to do something like a DFS walk to set a
index for each block, then sort the PHI arguments based on DFS index to
get a stable, consistent check here.
+/* For given basic blocks BB1 and BB2 (from functions FUNC1 and FUNC),
+ true value is returned if exception handling regions are equivalent
+ in these blocks. */
+
+bool
+sem_function::compare_eh_region (eh_region r1, eh_region r2, tree func1,
+ tree func2)
Similarly, are EH region indexes stable enough to check here?
+
+void
+sem_function::gsi_next_nondebug_stmt (gimple_stmt_iterator &gsi)
+{
+ gimple s;
+
+ s = gsi_stmt (gsi);
+
+ while (gimple_code (s) == GIMPLE_DEBUG)
+ {
+ gsi_next (&gsi);
+ gcc_assert (!gsi_end_p (gsi));
+
+ s = gsi_stmt (gsi);
+ }
+}
Should this be elsewhere? Seems like the concept of asking for the next
non-debug statement happens often.
+
+/* Iterates GSI statement iterator to the next non-virtual statement. */
+
+void
+sem_function::gsi_next_nonvirtual_phi (gimple_stmt_iterator &it)
+{
+ gimple phi;
+
+ if (gsi_end_p (it))
+ return;
+
+ phi = gsi_stmt (it);
+ gcc_assert (phi != NULL);
+
+ while (virtual_operand_p (gimple_phi_result (phi)))
+ {
+ gsi_next (&it);
+
+ if (gsi_end_p (it))
+ return;
+
+ phi = gsi_stmt (it);
+ }
+}
Similarly.
+
+/* Verifies that trees T1 and T2 do correspond. */
Again, don't like the "do correspond" terminology. Seems like we're
testing for equivalence.
A lot of the sem_function::compare_XXX feel like they should be
elsewhere, possibly their own module. I could easily see reusing them
for other purposes.
There's a number of algorithms that want to do the same kind of block
hashing & equality testing that you're doing here in ICF. Maybe pull
them into their own file/api?
+/* Gets symbol name of the item. */
+
+const char *
+sem_variable::name (void)
+{
+ return node->name ();
+}
+
+/* Gets assembler name of the item. */
+
+const char *
+sem_variable::asm_name (void)
+{
+ return node->asm_name ();
+}
Inline the accessors?
I'm starting to gloss over things.... It feels like we've got too much
stuff in this one file. Breaking things out would help (like for
example the hashing/equality bits). I'd like to see things broken out a
bit and reposted for further reviewing.
Jeff