[Bug c++/13776] [tree-ssa] Many C++ compile-time regression in 3.5-tree-ssa 040120

rguenth at tat dot physik dot uni-tuebingen dot de gcc-bugzilla@gcc.gnu.org
Sun Mar 14 18:00:00 GMT 2004


------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de  2004-03-14 18:00 -------
Subject: Re:  [tree-ssa] Many C++ compile-time regression in
 3.5-tree-ssa 040120

dberlin at dberlin dot org wrote:
> ------- Additional Comments From dberlin at dberlin dot org  2004-03-14 15:38 -------
> Subject: Bug 13776
> 
> 
> On Mar 14, 2004, at 8:35 AM, Richard Guenther wrote:
> 
> 
>>Daniel Berlin wrote:
>>
>>>This adds a DOM pass in between split critical edges and PRE, and 
>>>works for me on i686 and powerpc
>>>Tell me if it helps
>>
>>It made things worse in total, even PRE degraded some, but that may be 
>>in the noise.
>>
>>Richard.
> 
> 
> I don't even get close to these numbers.
> I've got your leafify patch installed (the one linked from the bug 
> report)
> Even at -O2, on a checking enabled compiler, with tramp3d-v2 from the 
> bug report, with the following sizes:
> 
> [root@dberlin dberlin]# ls -trl tramp3d-v2.ii
> -rw-r--r--    1 root     root      2962361 Feb  5 10:27 tramp3d-v2.ii
> generated from
> [root@dberlin dberlin]# ls -l tramp3d-v2.cpp
> -rw-r--r--    1 dberlin  dberlin   1952077 Feb  5 10:14 tramp3d-v2.cpp

That's the correct one.

> I get (without any changes to PRE):
> [root@dberlin gcc]# ./cc1plus -O2 ~dberlin/tramp3d-v2.ii
> ...
> Execution times (seconds)
>   garbage collection    :  46.23 (15%) usr   0.27 ( 3%) sys  46.66 (15%) 
> wall
>   callgraph construction:   0.68 ( 0%) usr   0.01 ( 0%) sys   0.72 ( 0%) 
> wall
>   callgraph optimization:   0.80 ( 0%) usr   0.07 ( 1%) sys   0.92 ( 0%) 
> wall
>   cfg construction      :   0.46 ( 0%) usr   0.04 ( 0%) sys   0.50 ( 0%) 
> wall
>   cfg cleanup           :   1.82 ( 1%) usr   0.02 ( 0%) sys   1.84 ( 1%) 
> wall
>   CFG verifier          :   8.07 ( 3%) usr   0.03 ( 0%) sys   8.15 ( 3%) 
> wall
>   trivially dead code   :   1.28 ( 0%) usr   0.00 ( 0%) sys   1.29 ( 0%) 
> wall
>   life analysis         :   2.96 ( 1%) usr   0.01 ( 0%) sys   2.97 ( 1%) 
> wall
>   life info update      :   1.52 ( 0%) usr   0.01 ( 0%) sys   1.56 ( 0%) 
> wall
>   alias analysis        :   2.64 ( 1%) usr   0.01 ( 0%) sys   2.66 ( 1%) 
> wall
>   register scan         :   1.23 ( 0%) usr   0.02 ( 0%) sys   1.25 ( 0%) 
> wall
>   rebuild jump labels   :   0.38 ( 0%) usr   0.00 ( 0%) sys   0.38 ( 0%) 
> wall
>   preprocessing         :   0.29 ( 0%) usr   0.17 ( 2%) sys   0.46 ( 0%) 
> wall
>   parser                :  13.65 ( 4%) usr   1.27 (16%) sys  20.56 ( 6%) 
> wall
>   name lookup           :   4.99 ( 2%) usr   2.00 (25%) sys   7.07 ( 2%) 
> wall
>   integration           :  28.17 ( 9%) usr   0.19 ( 2%) sys  28.57 ( 9%) 
> wall
>   tree gimplify         :   2.08 ( 1%) usr   0.05 ( 1%) sys   2.19 ( 1%) 
> wall
>   tree eh               :   2.86 ( 1%) usr   0.08 ( 1%) sys   2.96 ( 1%) 
> wall
>   tree CFG construction :   1.60 ( 1%) usr   0.09 ( 1%) sys   1.71 ( 1%) 
> wall
>   tree CFG cleanup      :   3.99 ( 1%) usr   0.04 ( 0%) sys   4.04 ( 1%) 
> wall
>   tree PTA              :   0.47 ( 0%) usr   0.01 ( 0%) sys   0.49 ( 0%) 
> wall
>   tree alias analysis   :   0.61 ( 0%) usr   0.00 ( 0%) sys   0.61 ( 0%) 
> wall
>   tree PHI insertion    :   9.15 ( 3%) usr   0.07 ( 1%) sys   9.26 ( 3%) 
> wall
>   tree SSA rewrite      :   3.30 ( 1%) usr   0.01 ( 0%) sys   3.32 ( 1%) 
> wall
>   tree SSA other        :   3.63 ( 1%) usr   0.51 ( 6%) sys   4.20 ( 1%) 
> wall
>   tree operand scan     :   3.62 ( 1%) usr   0.59 ( 7%) sys   4.22 ( 1%) 
> wall
>   dominator optimization:  15.57 ( 5%) usr   0.46 ( 6%) sys  16.09 ( 5%) 
> wall
>   tree SRA              :   0.31 ( 0%) usr   0.01 ( 0%) sys   0.32 ( 0%) 
> wall
>   tree CCP              :   1.56 ( 1%) usr   0.02 ( 0%) sys   1.58 ( 0%) 
> wall
>   tree split crit edges :   0.57 ( 0%) usr   0.03 ( 0%) sys   0.61 ( 0%) 
> wall
>   tree PRE              :  34.92 ( 9%) usr   0.14 ( 2%) sys  35.20 ( 9%) 
> wall
>   tree linearize phis   :   0.03 ( 0%) usr   0.02 ( 0%) sys   0.05 ( 0%) 
> wall
>   tree forward propagate:   1.12 ( 0%) usr   0.02 ( 0%) sys   1.14 ( 0%) 
> wall
>   tree conservative DCE :   3.02 ( 1%) usr   0.03 ( 0%) sys   3.06 ( 1%) 
> wall
>   tree aggressive DCE   :   0.78 ( 0%) usr   0.01 ( 0%) sys   0.79 ( 0%) 
> wall
>   tree DSE              :   2.18 ( 1%) usr   0.01 ( 0%) sys   2.20 ( 1%) 
> wall
>   tree copy headers     :   2.15 ( 1%) usr   0.02 ( 0%) sys   2.19 ( 1%) 
> wall
>   tree SSA to normal    :   2.42 ( 1%) usr   0.13 ( 2%) sys   2.61 ( 1%) 
> wall
>   tree NRV optimization :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> wall
>   tree rename SSA copies:   0.71 ( 0%) usr   0.04 ( 0%) sys   0.75 ( 0%) 
> wall
>   tree SSA verifier     :  25.23 ( 8%) usr   0.23 ( 3%) sys  25.52 ( 8%) 
> wall
>   tree STMT verifier    :   3.72 ( 1%) usr   0.03 ( 0%) sys   3.76 ( 1%) 
> wall
>   callgraph verifier    :   7.79 ( 3%) usr   0.25 ( 3%) sys   8.09 ( 3%) 
> wall
>   dominance frontiers   :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.27 ( 0%) 
> wall
>   control dependences   :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) 
> wall
>   expand                :  16.03 ( 5%) usr   0.19 ( 2%) sys  16.41 ( 5%) 
> wall
>   varconst              :   0.66 ( 0%) usr   0.05 ( 1%) sys   1.06 ( 0%) 
> wall
>   jump                  :   1.17 ( 0%) usr   0.15 ( 2%) sys   1.41 ( 0%) 
> wall
>   CSE                   :   8.76 ( 3%) usr   0.05 ( 1%) sys   8.84 ( 3%) 
> wall
>   global CSE            :   5.01 ( 2%) usr   0.13 ( 2%) sys   5.15 ( 2%) 
> wall
>   loop analysis         :   1.21 ( 0%) usr   0.01 ( 0%) sys   1.24 ( 0%) 
> wall
>   bypass jumps          :   0.94 ( 0%) usr   0.00 ( 0%) sys   0.94 ( 0%) 
> wall
>   CSE 2                 :   3.59 ( 1%) usr   0.02 ( 0%) sys   3.78 ( 1%) 
> wall
>   branch prediction     :   2.25 ( 1%) usr   0.01 ( 0%) sys   2.31 ( 1%) 
> wall
>   flow analysis         :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) 
> wall
>   combiner              :   2.58 ( 1%) usr   0.03 ( 0%) sys   2.64 ( 1%) 
> wall
>   if-conversion         :   0.57 ( 0%) usr   0.00 ( 0%) sys   0.57 ( 0%) 
> wall
>   regmove               :   0.85 ( 0%) usr   0.00 ( 0%) sys   0.86 ( 0%) 
> wall
>   local alloc           :   1.80 ( 1%) usr   0.01 ( 0%) sys   1.84 ( 1%) 
> wall
>   global alloc          :   5.34 ( 2%) usr   0.10 ( 1%) sys   5.50 ( 2%) 
> wall
>   reload CSE regs       :   2.24 ( 1%) usr   0.00 ( 0%) sys   2.25 ( 1%) 
> wall
>   flow 2                :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.34 ( 0%) 
> wall
>   if-conversion 2       :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.35 ( 0%) 
> wall
>   peephole 2            :   0.38 ( 0%) usr   0.00 ( 0%) sys   0.39 ( 0%) 
> wall
>   rename registers      :   1.43 ( 0%) usr   0.04 ( 0%) sys   1.52 ( 0%) 
> wall
>   scheduling 2          :   2.28 ( 1%) usr   0.08 ( 1%) sys   2.38 ( 1%) 
> wall
>   reorder blocks        :   0.49 ( 0%) usr   0.01 ( 0%) sys   0.50 ( 0%) 
> wall
>   shorten branches      :   0.70 ( 0%) usr   0.01 ( 0%) sys   0.71 ( 0%) 
> wall
>   reg stack             :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) 
> wall
>   final                 :   1.03 ( 0%) usr   0.14 ( 2%) sys   1.38 ( 0%) 
> wall
>   symout                :   0.02 ( 0%) usr   0.03 ( 0%) sys   0.06 ( 0%) 
> wall
>   rest of compilation   :   1.48 ( 0%) usr   0.04 ( 0%) sys   1.54 ( 0%) 
> wall
>   TOTAL                 : 310.60             8.12           327.62
> Extra diagnostic checks enabled; compiler may run slowly.
> Configure with --disable-checking to disable checks.
> 
> 
> With my changes to PRE, i get the same numbers, except PRE is at 28 
> seconds instead of 36.
> 
> I certainly get *nowhere close* to 600 seconds in PRE, or the numbers 
> you get overall.
> I can't fix a problem i can't reproduce, i can only take stabs at it.
> Can someone else please verify his numbers so i know whether it's my 
> test setup or his?

I even have checking disabled.  GC time seems to be identical, parsing 
is 13.5s vs 18.4s - the first big difference is integration, which 
suggests that leafifying is not enabled?  Maybe the patch applied 
"wrong", I attached a complete diff of my local changes.

Anyway, I'm running on a 1GHz Athlon with 1GB of ram, compiler is 
bootstrapped with checking disabled.

Richard.
Index: gcc/c-common.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/c-common.c,v
retrieving revision 1.344.2.63
diff -u -u -r1.344.2.63 c-common.c
--- gcc/c-common.c	2 Mar 2004 18:41:21 -0000	1.344.2.63
+++ gcc/c-common.c	14 Mar 2004 17:51:26 -0000
@@ -746,6 +746,7 @@
 static tree handle_noinline_attribute (tree *, tree, tree, int, bool *);
 static tree handle_always_inline_attribute (tree *, tree, tree, int,
 					    bool *);
+static tree handle_leafify_attribute (tree *, tree, tree, int, bool *);
 static tree handle_used_attribute (tree *, tree, tree, int, bool *);
 static tree handle_unused_attribute (tree *, tree, tree, int, bool *);
 static tree handle_const_attribute (tree *, tree, tree, int, bool *);
@@ -807,6 +808,8 @@
 			      handle_noinline_attribute },
   { "always_inline",          0, 0, true,  false, false,
 			      handle_always_inline_attribute },
+  { "leafify",                0, 0, true,  false, false,
+                              handle_leafify_attribute },
   { "used",                   0, 0, true,  false, false,
 			      handle_used_attribute },
   { "unused",                 0, 0, false, false, false,
@@ -4458,6 +4461,29 @@
 
   return NULL_TREE;
 }
+
+/* Handle a "leafify" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_leafify_attribute (tree *node, tree name,
+                          tree args ATTRIBUTE_UNUSED,
+                          int flags ATTRIBUTE_UNUSED, bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) == FUNCTION_DECL)
+    {
+      /* Do nothing else, just set the attribute.  We'll get at
+         it later with lookup_attribute.  */
+    }
+  else
+    {
+      warning ("`%s' attribute ignored", IDENTIFIER_POINTER (name));
+      *no_add_attrs = true;
+    }
+
+  return NULL_TREE;
+}
+
 
 /* Handle a "used" attribute; arguments as in
    struct attribute_spec.handler.  */
Index: gcc/cgraphunit.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraphunit.c,v
retrieving revision 1.1.4.39
diff -u -u -r1.1.4.39 cgraphunit.c
--- gcc/cgraphunit.c	4 Mar 2004 15:38:34 -0000	1.1.4.39
+++ gcc/cgraphunit.c	14 Mar 2004 17:51:26 -0000
@@ -1045,7 +1045,7 @@
   else
     e->callee->global.inlined_to = e->caller;
 
-  /* Recursivly clone all bodies.  */
+  /* Recursivly clone all inlined bodies.  */
   for (e = e->callee->callees; e; e = e->next_callee)
     if (!e->inline_failed)
       cgraph_clone_inlined_nodes (e, duplicate);
@@ -1192,7 +1192,7 @@
     recursive = what->decl == to->global.inlined_to->decl;
   else
     recursive = what->decl == to->decl;
-  /* Marking recursive function inlinine has sane semantic and thus we should
+  /* Marking recursive function inline has sane semantic and thus we should
      not warn on it.  */
   if (recursive && reason)
     *reason = (what->local.disregard_inline_limits
@@ -1440,6 +1440,67 @@
   free (heap_node);
 }
 
+/* Find callgraph nodes closing a circle in the graph.  The
+   resulting hashtab can be used to avoid walking the circles.
+   Uses the cgraph nodes ->aux field which needs to be zero
+   before and will be zero after operation.  */
+
+static void
+cgraph_find_cycles (struct cgraph_node *node, htab_t cycles)
+{
+  struct cgraph_edge *e;
+
+  if (node->aux)
+    {
+      void **slot;
+      slot = htab_find_slot (cycles, node, INSERT);
+      if (!*slot)
+	{
+	  if (cgraph_dump_file)
+	    fprintf (cgraph_dump_file, "Cycle contains %s\n", cgraph_node_name (node));
+	  *slot = node;
+	}
+      return;
+    }
+
+  node->aux = node;
+  for (e = node->callees; e; e = e->next_callee)
+    {
+       cgraph_find_cycles (e->callee, cycles); 
+    }
+  node->aux = 0;
+}
+
+/* Leafify the cgraph node.  We have to be careful in recursing
+   as to not run endlessly in circles of the callgraph.
+   We do so by using a hashtab of cycle entering nodes as generated
+   by cgraph_find_cycles.  */
+
+static void
+cgraph_leafify_node (struct cgraph_node *node, htab_t cycles)
+{
+  struct cgraph_edge *e;
+
+  for (e = node->callees; e; e = e->next_callee)
+    {
+      /* Inline call, if possible, and recurse.  Be sure we are not
+	 entering callgraph circles here.  */
+      if (e->inline_failed
+	  && e->callee->local.inlinable
+	  && !cgraph_recursive_inlining_p (node, e->callee,
+				  	   &e->inline_failed)
+	  && !htab_find (cycles, e->callee))
+	{
+	  if (cgraph_dump_file)
+    	    fprintf (cgraph_dump_file, " inlining %s", cgraph_node_name (e->callee));
+          cgraph_mark_inline_edge (e);
+	  cgraph_leafify_node (e->callee, cycles);
+	}
+      else if (cgraph_dump_file)
+	fprintf (cgraph_dump_file, " !inlining %s", cgraph_node_name (e->callee));
+    }
+}
+
 /* Decide on the inlining.  We do so in the topological order to avoid
    expenses on updating datastructures.  */
 
@@ -1477,6 +1538,24 @@
       struct cgraph_edge *e;
 
       node = order[i];
+
+      /* Handle nodes to be leafified, but don't update overall unit size.  */
+      if (lookup_attribute ("leafify", DECL_ATTRIBUTES (node->decl)) != NULL)
+        {
+	  int old_overall_insns = overall_insns;
+	  htab_t cycles;
+  	  if (cgraph_dump_file)
+    	    fprintf (cgraph_dump_file,
+	     	     "Leafifying %s\n", cgraph_node_name (node));
+	  cycles = htab_create (7, htab_hash_pointer, htab_eq_pointer, NULL);
+	  cgraph_find_cycles (node, cycles);
+	  cgraph_leafify_node (node, cycles);
+	  htab_delete (cycles);
+	  overall_insns = old_overall_insns;
+	  /* We don't need to consider always_inline functions inside the leafified
+	     function anymore.  */
+	  continue;
+        }
 
       for (e = node->callees; e; e = e->next_callee)
 	if (e->callee->local.disregard_inline_limits)
Index: gcc/doc/extend.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/extend.texi,v
retrieving revision 1.82.2.36
diff -u -u -r1.82.2.36 extend.texi
--- gcc/doc/extend.texi	2 Mar 2004 18:42:50 -0000	1.82.2.36
+++ gcc/doc/extend.texi	14 Mar 2004 17:51:30 -0000
@@ -1893,7 +1893,7 @@
 attributes when making a declaration.  This keyword is followed by an
 attribute specification inside double parentheses.  The following
 attributes are currently defined for functions on all targets:
-@code{noreturn}, @code{noinline}, @code{always_inline},
+@code{noreturn}, @code{noinline}, @code{always_inline}, @code{leafify},
 @code{pure}, @code{const}, @code{nothrow},
 @code{format}, @code{format_arg}, @code{no_instrument_function},
 @code{section}, @code{constructor}, @code{destructor}, @code{used},
@@ -1969,6 +1969,14 @@
 Generally, functions are not inlined unless optimization is specified.
 For functions declared inline, this attribute inlines the function even
 if no optimization level was specified.
+
+@cindex @code{leafify} function attribute
+@item leafify
+Generally, inlining into a function is limited.  For a function marked with
+this attribute, every call inside this function will be inlined, if possible.
+Whether the function itself is considered for inlining depends on its size and
+the current inlining parameters.  The @code{leafify} attribute only works
+reliably in unit-at-a-time mode.
 
 @cindex @code{pure} function attribute
 @item pure
Index: libstdc++-v3/include/c_std/std_cmath.h
===================================================================
RCS file: /cvs/gcc/gcc/libstdc++-v3/include/c_std/std_cmath.h,v
retrieving revision 1.5.6.7
diff -u -u -r1.5.6.7 std_cmath.h
--- libstdc++-v3/include/c_std/std_cmath.h	3 Jan 2004 23:05:32 -0000	1.5.6.7
+++ libstdc++-v3/include/c_std/std_cmath.h	14 Mar 2004 17:51:55 -0000
@@ -330,9 +330,31 @@
   { return __builtin_modfl(__x, __iptr); }
 
   template<typename _Tp>
-    inline _Tp
+    inline _Tp __attribute__((always_inline))
     __pow_helper(_Tp __x, int __n)
     {
+      if (__builtin_constant_p(__n))
+        switch (__n) {
+        case -1:
+          return _Tp(1)/__x;
+        case 0:
+          return _Tp(1);
+        case 1:
+          return __x;
+        case 2:
+          return __x*__x;
+#if ! __OPTIMIZE_SIZE__
+        case -2:
+          return _Tp(1)/(__x*__x);
+        case 3:
+          return __x*__x*__x;
+        case 4:
+          {
+             _Tp __y = __x*__x;
+             return __y*__y;
+          }
+#endif
+        }
       return __n < 0
         ? _Tp(1)/__cmath_power(__x, -__n)
         : __cmath_power(__x, __n);


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13776



More information about the Gcc-bugs mailing list