[Bug tree-optimization/35639] [4.3/4.4 Regression] -fprofile-generate = huge SCCs for PRE

Thu Feb 5 10:26:00 GMT 2009

------- Comment #12 from bonzini at gnu dot org  2009-02-05 10:26 -------
FRE is not a problem because all the time (93%) is spent computing ANTIC; of
this, half is phi_translate and the other half is bitmap_set operations.

I get a relatively good (15%) improvement from

Index: tree-ssa-sccvn.c
===================================================================

--- tree-ssa-sccvn.c     (revision 143938)
+++ tree-ssa-sccvn.c     (working copy)
@@ -398,9 +398,14 @@ vn_reference_op_eq (const void *p1, cons
 static hashval_t
 vn_reference_op_compute_hash (const vn_reference_op_t vro1)
 {
-  return iterative_hash_expr (vro1->op0, vro1->opcode)
-    + iterative_hash_expr (vro1->op1, vro1->opcode)
-    + iterative_hash_expr (vro1->op2, vro1->opcode);
+  hashval_t result = 0;
+  if (vro1->op0)
+    result += iterative_hash_expr (vro1->op0, vro1->opcode);
+  if (vro1->op1)
+    result += iterative_hash_expr (vro1->op1, vro1->opcode);
+  if (vro1->op2)
+    result += iterative_hash_expr (vro1->op2, vro1->opcode);
+  return result;
 }

 /* Return the hashcode for a given reference operation P1.  */


and another 8% from this:

Index: tree-ssa-pre.c
===================================================================
--- tree-ssa-pre.c      (revision 143938)
+++ tree-ssa-pre.c      (working copy)
@@ -216,11 +216,11 @@ pre_expr_hash (const void *p1)
     case CONSTANT:
       return vn_hash_constant_with_type (PRE_EXPR_CONSTANT (e));
     case NAME:
-      return iterative_hash_expr (PRE_EXPR_NAME (e), 0);
+      return iterative_hash_hashval_t (SSA_NAME_VERSION (PRE_EXPR_NAME (e)),
0);
     case NARY:
-      return vn_nary_op_compute_hash (PRE_EXPR_NARY (e));
+      return PRE_EXPR_NARY (e)->hashcode;
     case REFERENCE:
-      return vn_reference_compute_hash (PRE_EXPR_REFERENCE (e));
+      return PRE_EXPR_REFERENCE (e)->hashcode;
     default:
       abort ();
     }

(Tested with "make check RUNTESTFLAGS=tree-ssa.exp=*[pf]re*").  At least these
two kick hashing almost out of the profile and bring PRE down from 50% to 40%
of the compilation time.  They also speedup a bit the bitmap_sets since
get_or_alloc_expression_id was also doing hashing.

The remaining main offenders are phi_translate_set and phi_translate_1.  Apart
from some bitmap_sets, their profile is quite flat so no more microoptimization
I guess.

I'll bootstrap/regtest the above.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35639