[Bug c++/13776] [tree-ssa] Many C++ compile-time regression in 3.5-tree-ssa 040120
rguenth at tat dot physik dot uni-tuebingen dot de
gcc-bugzilla@gcc.gnu.org
Sun Mar 14 18:00:00 GMT 2004
------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-03-14 18:00 -------
Subject: Re: [tree-ssa] Many C++ compile-time regression in
3.5-tree-ssa 040120
dberlin at dberlin dot org wrote:
> ------- Additional Comments From dberlin at dberlin dot org 2004-03-14 15:38 -------
> Subject: Bug 13776
>
>
> On Mar 14, 2004, at 8:35 AM, Richard Guenther wrote:
>
>
>>Daniel Berlin wrote:
>>
>>>This adds a DOM pass in between split critical edges and PRE, and
>>>works for me on i686 and powerpc
>>>Tell me if it helps
>>
>>It made things worse in total, even PRE degraded some, but that may be
>>in the noise.
>>
>>Richard.
>
>
> I don't even get close to these numbers.
> I've got your leafify patch installed (the one linked from the bug
> report)
> Even at -O2, on a checking enabled compiler, with tramp3d-v2 from the
> bug report, with the following sizes:
>
> [root@dberlin dberlin]# ls -trl tramp3d-v2.ii
> -rw-r--r-- 1 root root 2962361 Feb 5 10:27 tramp3d-v2.ii
> generated from
> [root@dberlin dberlin]# ls -l tramp3d-v2.cpp
> -rw-r--r-- 1 dberlin dberlin 1952077 Feb 5 10:14 tramp3d-v2.cpp
That's the correct one.
> I get (without any changes to PRE):
> [root@dberlin gcc]# ./cc1plus -O2 ~dberlin/tramp3d-v2.ii
> ...
> Execution times (seconds)
> garbage collection : 46.23 (15%) usr 0.27 ( 3%) sys 46.66 (15%)
> wall
> callgraph construction: 0.68 ( 0%) usr 0.01 ( 0%) sys 0.72 ( 0%)
> wall
> callgraph optimization: 0.80 ( 0%) usr 0.07 ( 1%) sys 0.92 ( 0%)
> wall
> cfg construction : 0.46 ( 0%) usr 0.04 ( 0%) sys 0.50 ( 0%)
> wall
> cfg cleanup : 1.82 ( 1%) usr 0.02 ( 0%) sys 1.84 ( 1%)
> wall
> CFG verifier : 8.07 ( 3%) usr 0.03 ( 0%) sys 8.15 ( 3%)
> wall
> trivially dead code : 1.28 ( 0%) usr 0.00 ( 0%) sys 1.29 ( 0%)
> wall
> life analysis : 2.96 ( 1%) usr 0.01 ( 0%) sys 2.97 ( 1%)
> wall
> life info update : 1.52 ( 0%) usr 0.01 ( 0%) sys 1.56 ( 0%)
> wall
> alias analysis : 2.64 ( 1%) usr 0.01 ( 0%) sys 2.66 ( 1%)
> wall
> register scan : 1.23 ( 0%) usr 0.02 ( 0%) sys 1.25 ( 0%)
> wall
> rebuild jump labels : 0.38 ( 0%) usr 0.00 ( 0%) sys 0.38 ( 0%)
> wall
> preprocessing : 0.29 ( 0%) usr 0.17 ( 2%) sys 0.46 ( 0%)
> wall
> parser : 13.65 ( 4%) usr 1.27 (16%) sys 20.56 ( 6%)
> wall
> name lookup : 4.99 ( 2%) usr 2.00 (25%) sys 7.07 ( 2%)
> wall
> integration : 28.17 ( 9%) usr 0.19 ( 2%) sys 28.57 ( 9%)
> wall
> tree gimplify : 2.08 ( 1%) usr 0.05 ( 1%) sys 2.19 ( 1%)
> wall
> tree eh : 2.86 ( 1%) usr 0.08 ( 1%) sys 2.96 ( 1%)
> wall
> tree CFG construction : 1.60 ( 1%) usr 0.09 ( 1%) sys 1.71 ( 1%)
> wall
> tree CFG cleanup : 3.99 ( 1%) usr 0.04 ( 0%) sys 4.04 ( 1%)
> wall
> tree PTA : 0.47 ( 0%) usr 0.01 ( 0%) sys 0.49 ( 0%)
> wall
> tree alias analysis : 0.61 ( 0%) usr 0.00 ( 0%) sys 0.61 ( 0%)
> wall
> tree PHI insertion : 9.15 ( 3%) usr 0.07 ( 1%) sys 9.26 ( 3%)
> wall
> tree SSA rewrite : 3.30 ( 1%) usr 0.01 ( 0%) sys 3.32 ( 1%)
> wall
> tree SSA other : 3.63 ( 1%) usr 0.51 ( 6%) sys 4.20 ( 1%)
> wall
> tree operand scan : 3.62 ( 1%) usr 0.59 ( 7%) sys 4.22 ( 1%)
> wall
> dominator optimization: 15.57 ( 5%) usr 0.46 ( 6%) sys 16.09 ( 5%)
> wall
> tree SRA : 0.31 ( 0%) usr 0.01 ( 0%) sys 0.32 ( 0%)
> wall
> tree CCP : 1.56 ( 1%) usr 0.02 ( 0%) sys 1.58 ( 0%)
> wall
> tree split crit edges : 0.57 ( 0%) usr 0.03 ( 0%) sys 0.61 ( 0%)
> wall
> tree PRE : 34.92 ( 9%) usr 0.14 ( 2%) sys 35.20 ( 9%)
> wall
> tree linearize phis : 0.03 ( 0%) usr 0.02 ( 0%) sys 0.05 ( 0%)
> wall
> tree forward propagate: 1.12 ( 0%) usr 0.02 ( 0%) sys 1.14 ( 0%)
> wall
> tree conservative DCE : 3.02 ( 1%) usr 0.03 ( 0%) sys 3.06 ( 1%)
> wall
> tree aggressive DCE : 0.78 ( 0%) usr 0.01 ( 0%) sys 0.79 ( 0%)
> wall
> tree DSE : 2.18 ( 1%) usr 0.01 ( 0%) sys 2.20 ( 1%)
> wall
> tree copy headers : 2.15 ( 1%) usr 0.02 ( 0%) sys 2.19 ( 1%)
> wall
> tree SSA to normal : 2.42 ( 1%) usr 0.13 ( 2%) sys 2.61 ( 1%)
> wall
> tree NRV optimization : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%)
> wall
> tree rename SSA copies: 0.71 ( 0%) usr 0.04 ( 0%) sys 0.75 ( 0%)
> wall
> tree SSA verifier : 25.23 ( 8%) usr 0.23 ( 3%) sys 25.52 ( 8%)
> wall
> tree STMT verifier : 3.72 ( 1%) usr 0.03 ( 0%) sys 3.76 ( 1%)
> wall
> callgraph verifier : 7.79 ( 3%) usr 0.25 ( 3%) sys 8.09 ( 3%)
> wall
> dominance frontiers : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%)
> wall
> control dependences : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%)
> wall
> expand : 16.03 ( 5%) usr 0.19 ( 2%) sys 16.41 ( 5%)
> wall
> varconst : 0.66 ( 0%) usr 0.05 ( 1%) sys 1.06 ( 0%)
> wall
> jump : 1.17 ( 0%) usr 0.15 ( 2%) sys 1.41 ( 0%)
> wall
> CSE : 8.76 ( 3%) usr 0.05 ( 1%) sys 8.84 ( 3%)
> wall
> global CSE : 5.01 ( 2%) usr 0.13 ( 2%) sys 5.15 ( 2%)
> wall
> loop analysis : 1.21 ( 0%) usr 0.01 ( 0%) sys 1.24 ( 0%)
> wall
> bypass jumps : 0.94 ( 0%) usr 0.00 ( 0%) sys 0.94 ( 0%)
> wall
> CSE 2 : 3.59 ( 1%) usr 0.02 ( 0%) sys 3.78 ( 1%)
> wall
> branch prediction : 2.25 ( 1%) usr 0.01 ( 0%) sys 2.31 ( 1%)
> wall
> flow analysis : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%)
> wall
> combiner : 2.58 ( 1%) usr 0.03 ( 0%) sys 2.64 ( 1%)
> wall
> if-conversion : 0.57 ( 0%) usr 0.00 ( 0%) sys 0.57 ( 0%)
> wall
> regmove : 0.85 ( 0%) usr 0.00 ( 0%) sys 0.86 ( 0%)
> wall
> local alloc : 1.80 ( 1%) usr 0.01 ( 0%) sys 1.84 ( 1%)
> wall
> global alloc : 5.34 ( 2%) usr 0.10 ( 1%) sys 5.50 ( 2%)
> wall
> reload CSE regs : 2.24 ( 1%) usr 0.00 ( 0%) sys 2.25 ( 1%)
> wall
> flow 2 : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.34 ( 0%)
> wall
> if-conversion 2 : 0.35 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%)
> wall
> peephole 2 : 0.38 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%)
> wall
> rename registers : 1.43 ( 0%) usr 0.04 ( 0%) sys 1.52 ( 0%)
> wall
> scheduling 2 : 2.28 ( 1%) usr 0.08 ( 1%) sys 2.38 ( 1%)
> wall
> reorder blocks : 0.49 ( 0%) usr 0.01 ( 0%) sys 0.50 ( 0%)
> wall
> shorten branches : 0.70 ( 0%) usr 0.01 ( 0%) sys 0.71 ( 0%)
> wall
> reg stack : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%)
> wall
> final : 1.03 ( 0%) usr 0.14 ( 2%) sys 1.38 ( 0%)
> wall
> symout : 0.02 ( 0%) usr 0.03 ( 0%) sys 0.06 ( 0%)
> wall
> rest of compilation : 1.48 ( 0%) usr 0.04 ( 0%) sys 1.54 ( 0%)
> wall
> TOTAL : 310.60 8.12 327.62
> Extra diagnostic checks enabled; compiler may run slowly.
> Configure with --disable-checking to disable checks.
>
>
> With my changes to PRE, i get the same numbers, except PRE is at 28
> seconds instead of 36.
>
> I certainly get *nowhere close* to 600 seconds in PRE, or the numbers
> you get overall.
> I can't fix a problem i can't reproduce, i can only take stabs at it.
> Can someone else please verify his numbers so i know whether it's my
> test setup or his?
I even have checking disabled. GC time seems to be identical, parsing
is 13.5s vs 18.4s - the first big difference is integration, which
suggests that leafifying is not enabled? Maybe the patch applied
"wrong", I attached a complete diff of my local changes.
Anyway, I'm running on a 1GHz Athlon with 1GB of ram, compiler is
bootstrapped with checking disabled.
Richard.
Index: gcc/c-common.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/c-common.c,v
retrieving revision 1.344.2.63
diff -u -u -r1.344.2.63 c-common.c
--- gcc/c-common.c 2 Mar 2004 18:41:21 -0000 1.344.2.63
+++ gcc/c-common.c 14 Mar 2004 17:51:26 -0000
@@ -746,6 +746,7 @@
static tree handle_noinline_attribute (tree *, tree, tree, int, bool *);
static tree handle_always_inline_attribute (tree *, tree, tree, int,
bool *);
+static tree handle_leafify_attribute (tree *, tree, tree, int, bool *);
static tree handle_used_attribute (tree *, tree, tree, int, bool *);
static tree handle_unused_attribute (tree *, tree, tree, int, bool *);
static tree handle_const_attribute (tree *, tree, tree, int, bool *);
@@ -807,6 +808,8 @@
handle_noinline_attribute },
{ "always_inline", 0, 0, true, false, false,
handle_always_inline_attribute },
+ { "leafify", 0, 0, true, false, false,
+ handle_leafify_attribute },
{ "used", 0, 0, true, false, false,
handle_used_attribute },
{ "unused", 0, 0, false, false, false,
@@ -4458,6 +4461,29 @@
return NULL_TREE;
}
+
+/* Handle a "leafify" attribute; arguments as in
+ struct attribute_spec.handler. */
+
+static tree
+handle_leafify_attribute (tree *node, tree name,
+ tree args ATTRIBUTE_UNUSED,
+ int flags ATTRIBUTE_UNUSED, bool *no_add_attrs)
+{
+ if (TREE_CODE (*node) == FUNCTION_DECL)
+ {
+ /* Do nothing else, just set the attribute. We'll get at
+ it later with lookup_attribute. */
+ }
+ else
+ {
+ warning ("`%s' attribute ignored", IDENTIFIER_POINTER (name));
+ *no_add_attrs = true;
+ }
+
+ return NULL_TREE;
+}
+
/* Handle a "used" attribute; arguments as in
struct attribute_spec.handler. */
Index: gcc/cgraphunit.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/cgraphunit.c,v
retrieving revision 1.1.4.39
diff -u -u -r1.1.4.39 cgraphunit.c
--- gcc/cgraphunit.c 4 Mar 2004 15:38:34 -0000 1.1.4.39
+++ gcc/cgraphunit.c 14 Mar 2004 17:51:26 -0000
@@ -1045,7 +1045,7 @@
else
e->callee->global.inlined_to = e->caller;
- /* Recursivly clone all bodies. */
+ /* Recursivly clone all inlined bodies. */
for (e = e->callee->callees; e; e = e->next_callee)
if (!e->inline_failed)
cgraph_clone_inlined_nodes (e, duplicate);
@@ -1192,7 +1192,7 @@
recursive = what->decl == to->global.inlined_to->decl;
else
recursive = what->decl == to->decl;
- /* Marking recursive function inlinine has sane semantic and thus we should
+ /* Marking recursive function inline has sane semantic and thus we should
not warn on it. */
if (recursive && reason)
*reason = (what->local.disregard_inline_limits
@@ -1440,6 +1440,67 @@
free (heap_node);
}
+/* Find callgraph nodes closing a circle in the graph. The
+ resulting hashtab can be used to avoid walking the circles.
+ Uses the cgraph nodes ->aux field which needs to be zero
+ before and will be zero after operation. */
+
+static void
+cgraph_find_cycles (struct cgraph_node *node, htab_t cycles)
+{
+ struct cgraph_edge *e;
+
+ if (node->aux)
+ {
+ void **slot;
+ slot = htab_find_slot (cycles, node, INSERT);
+ if (!*slot)
+ {
+ if (cgraph_dump_file)
+ fprintf (cgraph_dump_file, "Cycle contains %s\n", cgraph_node_name (node));
+ *slot = node;
+ }
+ return;
+ }
+
+ node->aux = node;
+ for (e = node->callees; e; e = e->next_callee)
+ {
+ cgraph_find_cycles (e->callee, cycles);
+ }
+ node->aux = 0;
+}
+
+/* Leafify the cgraph node. We have to be careful in recursing
+ as to not run endlessly in circles of the callgraph.
+ We do so by using a hashtab of cycle entering nodes as generated
+ by cgraph_find_cycles. */
+
+static void
+cgraph_leafify_node (struct cgraph_node *node, htab_t cycles)
+{
+ struct cgraph_edge *e;
+
+ for (e = node->callees; e; e = e->next_callee)
+ {
+ /* Inline call, if possible, and recurse. Be sure we are not
+ entering callgraph circles here. */
+ if (e->inline_failed
+ && e->callee->local.inlinable
+ && !cgraph_recursive_inlining_p (node, e->callee,
+ &e->inline_failed)
+ && !htab_find (cycles, e->callee))
+ {
+ if (cgraph_dump_file)
+ fprintf (cgraph_dump_file, " inlining %s", cgraph_node_name (e->callee));
+ cgraph_mark_inline_edge (e);
+ cgraph_leafify_node (e->callee, cycles);
+ }
+ else if (cgraph_dump_file)
+ fprintf (cgraph_dump_file, " !inlining %s", cgraph_node_name (e->callee));
+ }
+}
+
/* Decide on the inlining. We do so in the topological order to avoid
expenses on updating datastructures. */
@@ -1477,6 +1538,24 @@
struct cgraph_edge *e;
node = order[i];
+
+ /* Handle nodes to be leafified, but don't update overall unit size. */
+ if (lookup_attribute ("leafify", DECL_ATTRIBUTES (node->decl)) != NULL)
+ {
+ int old_overall_insns = overall_insns;
+ htab_t cycles;
+ if (cgraph_dump_file)
+ fprintf (cgraph_dump_file,
+ "Leafifying %s\n", cgraph_node_name (node));
+ cycles = htab_create (7, htab_hash_pointer, htab_eq_pointer, NULL);
+ cgraph_find_cycles (node, cycles);
+ cgraph_leafify_node (node, cycles);
+ htab_delete (cycles);
+ overall_insns = old_overall_insns;
+ /* We don't need to consider always_inline functions inside the leafified
+ function anymore. */
+ continue;
+ }
for (e = node->callees; e; e = e->next_callee)
if (e->callee->local.disregard_inline_limits)
Index: gcc/doc/extend.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/extend.texi,v
retrieving revision 1.82.2.36
diff -u -u -r1.82.2.36 extend.texi
--- gcc/doc/extend.texi 2 Mar 2004 18:42:50 -0000 1.82.2.36
+++ gcc/doc/extend.texi 14 Mar 2004 17:51:30 -0000
@@ -1893,7 +1893,7 @@
attributes when making a declaration. This keyword is followed by an
attribute specification inside double parentheses. The following
attributes are currently defined for functions on all targets:
-@code{noreturn}, @code{noinline}, @code{always_inline},
+@code{noreturn}, @code{noinline}, @code{always_inline}, @code{leafify},
@code{pure}, @code{const}, @code{nothrow},
@code{format}, @code{format_arg}, @code{no_instrument_function},
@code{section}, @code{constructor}, @code{destructor}, @code{used},
@@ -1969,6 +1969,14 @@
Generally, functions are not inlined unless optimization is specified.
For functions declared inline, this attribute inlines the function even
if no optimization level was specified.
+
+@cindex @code{leafify} function attribute
+@item leafify
+Generally, inlining into a function is limited. For a function marked with
+this attribute, every call inside this function will be inlined, if possible.
+Whether the function itself is considered for inlining depends on its size and
+the current inlining parameters. The @code{leafify} attribute only works
+reliably in unit-at-a-time mode.
@cindex @code{pure} function attribute
@item pure
Index: libstdc++-v3/include/c_std/std_cmath.h
===================================================================
RCS file: /cvs/gcc/gcc/libstdc++-v3/include/c_std/std_cmath.h,v
retrieving revision 1.5.6.7
diff -u -u -r1.5.6.7 std_cmath.h
--- libstdc++-v3/include/c_std/std_cmath.h 3 Jan 2004 23:05:32 -0000 1.5.6.7
+++ libstdc++-v3/include/c_std/std_cmath.h 14 Mar 2004 17:51:55 -0000
@@ -330,9 +330,31 @@
{ return __builtin_modfl(__x, __iptr); }
template<typename _Tp>
- inline _Tp
+ inline _Tp __attribute__((always_inline))
__pow_helper(_Tp __x, int __n)
{
+ if (__builtin_constant_p(__n))
+ switch (__n) {
+ case -1:
+ return _Tp(1)/__x;
+ case 0:
+ return _Tp(1);
+ case 1:
+ return __x;
+ case 2:
+ return __x*__x;
+#if ! __OPTIMIZE_SIZE__
+ case -2:
+ return _Tp(1)/(__x*__x);
+ case 3:
+ return __x*__x*__x;
+ case 4:
+ {
+ _Tp __y = __x*__x;
+ return __y*__y;
+ }
+#endif
+ }
return __n < 0
? _Tp(1)/__cmath_power(__x, -__n)
: __cmath_power(__x, __n);
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13776
More information about the Gcc-bugs
mailing list