[tree-profiling-branch PATCH] Function cloning + IPCP extension

Jan Hubicka jh@suse.cz
Sun Aug 29 15:58:00 GMT 2004


> Hello,
> 
> This patch implements function cloning.
> It also contains extension to  IPCP (Interprocedural constant propagation) 
> 
> optimization http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00998.html for 
> using function cloning.
> 
> Now in order to enable IPCP for a given application, we should use the 
> command
> gcc -fipa-cp -combine file1.c file2.c file3.c 
> where file1.c, file2.c, file3.c are the files of the application.
> 
> Bootstrapped and regression tests successfully passed on POWER4.
> 
> Comments are welcome.

Hi,
For the moment I am taking the actual optimization as black box and just
few comemnts on the changes in generic code:

*************** cgraph_remove_node (struct cgraph_node *
*** 328,335 ****
        else
  	{
            htab_clear_slot (cgraph_hash, slot);
! 	  if (!dump_enabled_p (TDI_all))
! 	    {
                DECL_SAVED_TREE (node->decl) = NULL;
  	      DECL_STRUCT_FUNCTION (node->decl) = NULL;
  	    }
--- 342,350 ----
        else
  	{
            htab_clear_slot (cgraph_hash, slot);
!           /* Do not delete the node if it has still has new versions.  */
! 	  if (!dump_enabled_p (TDI_all) && node->node_clones == 0)
! 	    { 
                DECL_SAVED_TREE (node->decl) = NULL;
  	      DECL_STRUCT_FUNCTION (node->decl) = NULL;
  	    }
*************** cgraph_remove_node (struct cgraph_node *
*** 347,361 ****
  
    /* Work out whether we still need a function body (either there is inline
       clone or there is out of line function whose body is not written).  */
!   if (check_dead && flag_unit_at_a_time)
      {
        struct cgraph_node *n;
! 
        for (n = *slot; n; n = n->next_clone)
! 	if (n->global.inlined_to
! 	    || (!n->global.inlined_to
! 		&& !TREE_ASM_WRITTEN (n->decl) && !DECL_EXTERNAL (n->decl)))
! 	  break;
        if (!n && !dump_enabled_p (TDI_all))
  	{
  	  DECL_SAVED_TREE (node->decl) = NULL;
--- 362,376 ----
  
    /* Work out whether we still need a function body (either there is inline
       clone or there is out of line function whose body is not written).  */
!   if (check_dead && flag_unit_at_a_time && node->node_clones == 0)
      {
        struct cgraph_node *n;
!       
        for (n = *slot; n; n = n->next_clone)
! 	  if (n->global.inlined_to
! 	      || (!n->global.inlined_to
! 	  	&& !TREE_ASM_WRITTEN (n->decl) && !DECL_EXTERNAL (n->decl)))
! 	    break;

It looks like we do have two different mechanizms to keep function body
alive until all clones are processed.  For inline clones we use the
linked list via next_clone and take advantage of fact that all of them
do have same DECL node and same hash slot, while for new clones we use
reference counting.

I think this should be unified - we probably should use next_clone list
to keep info about all the clones and kill the ref counting.  We
shoule be able to use the orig_clone pointer you added almost everywhere
where we currently look into hashtable to find the first clone on the
list.  There might be dificulties with memory management wanting to
elliminate main clone before the clones are dead, but we either might
want to simply disalllow that.  (alternate choice that comes into mid is
simply to make the list doubly linked and kill the orig_node pointer,
but having the original seems usefull)

I would like this to be done before commiting the patch - technically
the inline clones and IPA clones should be dealt with as similarly as
possible.  If this will be causing some major dificulties for you, I can
look into it sometime next week too.

+ /* Copy collees of node FROM to node TO.  */
+ void
+ cgraph_copy_node_callees (struct cgraph_node *to,
+                           struct cgraph_node *from)
+ {
+   struct cgraph_edge *e;
+   
+   for (e = from->callees;e; e=e->next_callee)
+     cgraph_clone_edge (e, to, e->call_expr);
+ 
+ }
+ 
+ 
+ void
+ cgraph_copy_local_info (struct cgraph_node *new_node, 
+ 			struct cgraph_node *orig_node)
+ {
+   new_node->local.self_insns = orig_node->local.self_insns;
+   new_node->local.local = orig_node->local.local;
+   new_node->local.finalized = orig_node->local.finalized;
+   new_node->local.disregard_inline_limits = 
+     orig_node->local.disregard_inline_limits;
+   new_node->local.inlinable = orig_node->local.inlinable;
+   new_node->local.redefined_extern_inline = 
+     orig_node->local.redefined_extern_inline;
+ }
+ 

If clones are unified, the cgraph_clone_node should do this for you.
However breaking it up into these two functions (made static) sounds
like good idea.

+ /* Get first callee of node.  */
+ #define FIRST_CALLEE(node)               ((node)->callees)
+ /* Get next callee from edge.  */
+ #define NEXT_CALLEE(edge)                ((edge)->next_callee)
+ /* Returns whether edge is last callee.  */
+ #define CALLEE_NOT_LAST(edge)            ((edge) != NULL)
+ 

All the current code use direct tests.  If we want to switch to accesor
macros, we ought to do that consistently at once, but I am not
convienced that they are prettier than the former...
*************** cgraph_build_static_cdtor (char which, t
*** 1857,1859 ****
--- 1862,2053 ----
        fn (XEXP (DECL_RTL (decl), 0), priority);
      }
  }
+ 
+ /* Remove unreachable nodes. */
+ void
+ cgraph_unreachable_nodes_elimination (void)

how this is different from cgraph_remove_unreachable_nodes we have
already?
+ /* Create a new cgraph node which is the new version of 
+    OLD_VERSION node.  REDIRECT_CALLERS holds the callers
+    of OLD_VERSION which should be redirected to point to  
+    NEW_VERSION.  ALL the callees edges of OLD_VERSION 
+    are cloned to the new version node. 
+    This function does not set the reachable flag in the new 
+    version node.  */
+ 
+ struct cgraph_node *
+ cgraph_copy_node_for_versioning (struct cgraph_node *old_version,
+ 				 tree new_decl, varray_type redirect_callers)

this is what I would like to see implemented in terms of (possibly
extended) clone_node call...
+ 
+ int
+ c_cannot_version_tree_fn (tree *fnp ATTRIBUTE_UNUSED)
+ {
+   return 0;
+ }

For C++ you are outlawing the constructors and destructors.  Will this
work on static constructors and destructors we support as an extension?
Perhaps these checks can be moved to generic code anyway.
Common subdirectories: /home/eres/ORIG_25/gcc/gcc/doc and /home/eres/IPA_SPEC/gcc/gcc/doc
diff -cp /home/eres/ORIG_25/gcc/gcc/dwarf2out.c /home/eres/IPA_SPEC/gcc/gcc/dwarf2out.c
*** /home/eres/ORIG_25/gcc/gcc/dwarf2out.c	2004-08-03 02:11:28.000000000 +0300
--- /home/eres/IPA_SPEC/gcc/gcc/dwarf2out.c	2004-08-25 16:42:15.000000000 +0300
*************** gen_subprogram_die (tree decl, dw_die_re
*** 11262,11270 ****
  	     we should detect this case and ignore it.  For now, if we have
  	     already reported an error, any error at all, then assume that
  	     we got here because of an input error, not a dwarf2 bug.  */
! 	  if (errorcount)
! 	    return;
! 	  abort ();
  	}
  
        /* If the definition comes from the same place as the declaration,
--- 11262,11270 ----
  	     we should detect this case and ignore it.  For now, if we have
  	     already reported an error, any error at all, then assume that
  	     we got here because of an input error, not a dwarf2 bug.  */
!           if (errorcount)
!             return;
!           abort ();

accident?
+ 
+ /* Copy NODE (which must be a DECL).  The DECL originally was in the FROM_FN,
+    but now it will be in the TO_FN. As apposed for to copy_decl_for_inlining ()
+    function; we do not give a special treatment to parm_decl and result_decl.  */ 
+ tree
+ copy_decl_for_versioning (tree decl, tree from_fn, tree to_fn)

Looks like some code sharing with the former would help here ;)

Otherwise the patch looks very resonable.  In the case you will be
finished soon enought for mainline, I would like at least the cgraph
interface portions of the patch to be applied there.

Thanks,
Honza



More information about the Gcc-patches mailing list