[Bug tree-optimization/35639] [4.3/4.4 Regression] -fprofile-generate = huge SCCs for PRE
bonzini at gnu dot org
gcc-bugzilla@gcc.gnu.org
Thu Feb 5 10:26:00 GMT 2009
------- Comment #12 from bonzini at gnu dot org 2009-02-05 10:26 -------
FRE is not a problem because all the time (93%) is spent computing ANTIC; of
this, half is phi_translate and the other half is bitmap_set operations.
I get a relatively good (15%) improvement from
Index: tree-ssa-sccvn.c
===================================================================
--- tree-ssa-sccvn.c (revision 143938)
+++ tree-ssa-sccvn.c (working copy)
@@ -398,9 +398,14 @@ vn_reference_op_eq (const void *p1, cons
static hashval_t
vn_reference_op_compute_hash (const vn_reference_op_t vro1)
{
- return iterative_hash_expr (vro1->op0, vro1->opcode)
- + iterative_hash_expr (vro1->op1, vro1->opcode)
- + iterative_hash_expr (vro1->op2, vro1->opcode);
+ hashval_t result = 0;
+ if (vro1->op0)
+ result += iterative_hash_expr (vro1->op0, vro1->opcode);
+ if (vro1->op1)
+ result += iterative_hash_expr (vro1->op1, vro1->opcode);
+ if (vro1->op2)
+ result += iterative_hash_expr (vro1->op2, vro1->opcode);
+ return result;
}
/* Return the hashcode for a given reference operation P1. */
and another 8% from this:
Index: tree-ssa-pre.c
===================================================================
--- tree-ssa-pre.c (revision 143938)
+++ tree-ssa-pre.c (working copy)
@@ -216,11 +216,11 @@ pre_expr_hash (const void *p1)
case CONSTANT:
return vn_hash_constant_with_type (PRE_EXPR_CONSTANT (e));
case NAME:
- return iterative_hash_expr (PRE_EXPR_NAME (e), 0);
+ return iterative_hash_hashval_t (SSA_NAME_VERSION (PRE_EXPR_NAME (e)),
0);
case NARY:
- return vn_nary_op_compute_hash (PRE_EXPR_NARY (e));
+ return PRE_EXPR_NARY (e)->hashcode;
case REFERENCE:
- return vn_reference_compute_hash (PRE_EXPR_REFERENCE (e));
+ return PRE_EXPR_REFERENCE (e)->hashcode;
default:
abort ();
}
(Tested with "make check RUNTESTFLAGS=tree-ssa.exp=*[pf]re*"). At least these
two kick hashing almost out of the profile and bring PRE down from 50% to 40%
of the compilation time. They also speedup a bit the bitmap_sets since
get_or_alloc_expression_id was also doing hashing.
The remaining main offenders are phi_translate_set and phi_translate_1. Apart
from some bitmap_sets, their profile is quite flat so no more microoptimization
I guess.
I'll bootstrap/regtest the above.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35639
More information about the Gcc-bugs
mailing list