This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Inline heuristics fixes 1/4


Hi,
I am sending the patch mail with delay. For some reason it never made it into the mailing list.

Honza

Hi,
I looked into PRs related to inlining heuristics and in particular I tried to get -Os
non-joke for C++ code.  This was relatively succesful in the end, tramp3d -Os size is now
641Kb, while -O2 size is 695Kb.

Without the patch -O2 size is 764 and -Os size is completely out of bounds. -Os also got a lot faster.
GCC 4.5 -O2 size is 895.
With -fwhole-program -O2 size drops to 641Kb and -Os size to 469.

I then did testing on Mozilla where I get 2% code size reduction at -Os, that gets us still 2% bigger
than 4.5 compiler.  Also nightly testing on vangelis shows important improvements in CSiBE (halving
the regression we got for introduction of the new heuristics - I will re--analyze the rest of it too),
freefem, cwchessoard, scimark. There seems to be no off-noise regression.

I broke the patch into several parts for easier testing

1) Fixes in cost metric.
2) More sane handling of COMDAT functions (basically reducing probability that it will be shared across
   units, so resonably large COMDATs called once are inlined)
3) More sane prediction for SRA (not expecting that arbitrary large ctors will all go away)
4) Bumping early-inlining-insns from 8 to 10 to compensate for 3) at tramp3d.

So this is first part of the patch.  I noticed that is_inexpensive_builtin was introduced,
buts its use is slopy in estimate_num_insns since builtin call is set to have precisely same
cost as function call.  

Other problem is that return instruction is esitmated as 0 time&size.  Not doing so makes GCC
to get some extra push to eliminate offline copies of functions understanding that offline functions
are bigger than inline equivalents.

Finaly noticing existence of is_inexpensive_builtin, the leaf_node_p function can be made a bit better.

Bootstrapped/regtested x86_64, will commit it today if there are no complains.

Honza

	* ipa-inline.c (leaf_node_p): Implement using is_inexpensive_builtin.
	* tree-inline.c (estimate_num_insns): Inexpensive builtins are like
	normal instructions; be sure bultin is not implemented in this file;
	compute non-zero return cost.
	(init_inline_once): Reduce builtin_call_cost to 1; set return cost.
	* tree-inline.h (eni_weights_d): Add return cost.
Index: ipa-inline.c
===================================================================
*** ipa-inline.c	(revision 166453)
--- ipa-inline.c	(working copy)
*************** cgraph_decide_inlining (void)
*** 1578,1593 ****
    return 0;
  }
  
! /* Return true when N is leaf function.  Accept cheap (pure&const) builtins
     in leaf functions.  */
  static bool
  leaf_node_p (struct cgraph_node *n)
  {
    struct cgraph_edge *e;
    for (e = n->callees; e; e = e->next_callee)
!     if (!DECL_BUILT_IN (e->callee->decl)
! 	|| (!TREE_READONLY (e->callee->decl)
! 	    || DECL_PURE_P (e->callee->decl)))
        return false;
    return true;
  }
--- 1584,1598 ----
    return 0;
  }
  
! /* Return true when N is leaf function.  Accept cheap builtins
     in leaf functions.  */
+ 
  static bool
  leaf_node_p (struct cgraph_node *n)
  {
    struct cgraph_edge *e;
    for (e = n->callees; e; e = e->next_callee)
!     if (!is_inexpensive_builtin (e->callee->decl))
        return false;
    return true;
  }
Index: tree-inline.c
===================================================================
*** tree-inline.c	(revision 166453)
--- tree-inline.c	(working copy)
*************** estimate_num_insns (gimple stmt, eni_wei
*** 3484,3493 ****
  	if (POINTER_TYPE_P (funtype))
  	  funtype = TREE_TYPE (funtype);
  
! 	if (is_simple_builtin (decl))
  	  return 0;
  	else if (is_inexpensive_builtin (decl))
! 	  cost = weights->target_builtin_call_cost;
  	else
  	  cost = weights->call_cost;
  
--- 3484,3499 ----
  	if (POINTER_TYPE_P (funtype))
  	  funtype = TREE_TYPE (funtype);
  
! 	/* Do not special case builtins where we see the body.
! 	   This just confuse inliner.  */
! 	if (!decl || cgraph_node (decl)->analyzed)
! 	  cost = weights->call_cost;
! 	/* For buitins that are likely expanded to nothing or
! 	   inlined do not account operand costs.  */
! 	else if (is_simple_builtin (decl))
  	  return 0;
  	else if (is_inexpensive_builtin (decl))
! 	  return weights->target_builtin_call_cost;
  	else
  	  cost = weights->call_cost;
  
*************** estimate_num_insns (gimple stmt, eni_wei
*** 3536,3546 ****
  	break;
        }
  
      case GIMPLE_GOTO:
      case GIMPLE_LABEL:
      case GIMPLE_NOP:
      case GIMPLE_PHI:
-     case GIMPLE_RETURN:
      case GIMPLE_PREDICT:
      case GIMPLE_DEBUG:
        return 0;
--- 3542,3554 ----
  	break;
        }
  
+     case GIMPLE_RETURN:
+       return weights->return_cost;
+ 
      case GIMPLE_GOTO:
      case GIMPLE_LABEL:
      case GIMPLE_NOP:
      case GIMPLE_PHI:
      case GIMPLE_PREDICT:
      case GIMPLE_DEBUG:
        return 0;
*************** init_inline_once (void)
*** 3640,3655 ****
    eni_size_weights.div_mod_cost = 1;
    eni_size_weights.omp_cost = 40;
    eni_size_weights.time_based = false;
  
    /* Estimating time for call is difficult, since we have no idea what the
       called function does.  In the current uses of eni_time_weights,
       underestimating the cost does less harm than overestimating it, so
       we choose a rather small value here.  */
    eni_time_weights.call_cost = 10;
!   eni_time_weights.target_builtin_call_cost = 10;
    eni_time_weights.div_mod_cost = 10;
    eni_time_weights.omp_cost = 40;
    eni_time_weights.time_based = true;
  }
  
  /* Estimate the number of instructions in a gimple_seq. */
--- 3648,3665 ----
    eni_size_weights.div_mod_cost = 1;
    eni_size_weights.omp_cost = 40;
    eni_size_weights.time_based = false;
+   eni_size_weights.return_cost = 1;
  
    /* Estimating time for call is difficult, since we have no idea what the
       called function does.  In the current uses of eni_time_weights,
       underestimating the cost does less harm than overestimating it, so
       we choose a rather small value here.  */
    eni_time_weights.call_cost = 10;
!   eni_time_weights.target_builtin_call_cost = 1;
    eni_time_weights.div_mod_cost = 10;
    eni_time_weights.omp_cost = 40;
    eni_time_weights.time_based = true;
+   eni_time_weights.return_cost = 2;
  }
  
  /* Estimate the number of instructions in a gimple_seq. */
Index: tree-inline.h
===================================================================
*** tree-inline.h	(revision 166453)
--- tree-inline.h	(working copy)
*************** typedef struct eni_weights_d
*** 144,149 ****
--- 144,152 ----
    /* Cost for omp construct.  */
    unsigned omp_cost;
  
+   /* Cost of return.  */
+   unsigned return_cost;
+ 
    /* True when time of statemnt should be estimated.  Thus i.e
       cost of switch statement is logarithmic rather than linear in number
       of cases.  */


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]