This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
PR tree-optimization/43884, lto/44334: big loop nests cause us to optimize for size in hot areas
- From: Jan Hubicka <hubicka at ucw dot cz>
- To: gcc-patches at gcc dot gnu dot org
- Date: Sat, 22 Jan 2011 22:44:14 +0100
- Subject: PR tree-optimization/43884, lto/44334: big loop nests cause us to optimize for size in hot areas
Hi,
these two PRs demonstrate problem where too large loop nests causes our profile estimation
code to predicts parts of functions as cold that are in fact executed frequently.
This is commonly problem for Fortran code where large nests are common, but it becomes
more an issue with LTO where we can bring the loops into bigger functions and distribute
this problem further.
This patch solves the problem by siply changing definition of hot BB. We no
longer base estimation on the hottest BB in the function, but on the frequency
of entry BB. So only code that is guarded byunlikely conditionals is now
considered cold.
For GCC 4.7 we probably should start differentiating in between blocks that are known
to be cold (they lead to abort, or are reachable only by EH, for instance) and blocks
that seems cold based on profile estimation. Wu&Larus paper claims that profile estimation
is pretty good here, but it don't seems to be the case in parctice, since it is too easy
to fool and those patterns do happen in internal loops where this matters.
The patch needs 2 updates into testsuite. First is outer-2 testcase where we now copy
loop header of the outermost loop that makes loop iteration estimate to work and the off-by-one
error causes us to not parallelize the loop. I fixed it by simply increasing number of iterations.
In ldist-pr45948.c we now produce slightly worse code than before, because we get confused
by loop header copying. I filled in PR47033 to track this problem.
Bootstrapped/regtested x86_64, will commit it shortly.
Honza
PR tree-optimization/43884
PR lto/44334
* predict.c (maybe_hot_frequency_p): Use entry block frequency as an base.
* doc/invoke.texi (hot-bb-frequency-fraction): Update docs.
* gcc.dg/autopar/outer-2.c: Increase array size.
* gcc.dg/tree-ssa/ldist-pr45948.c: Update test.
Index: predict.c
===================================================================
--- predict.c (revision 169127)
+++ predict.c (working copy)
@@ -126,7 +126,7 @@ maybe_hot_frequency_p (int freq)
if (node->frequency == NODE_FREQUENCY_EXECUTED_ONCE
&& freq <= (ENTRY_BLOCK_PTR->frequency * 2 / 3))
return false;
- if (freq < BB_FREQ_MAX / PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION))
+ if (freq < ENTRY_BLOCK_PTR->frequency / PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION))
return false;
return true;
}
Index: testsuite/gcc.dg/autopar/outer-2.c
===================================================================
--- testsuite/gcc.dg/autopar/outer-2.c (revision 169127)
+++ testsuite/gcc.dg/autopar/outer-2.c (working copy)
@@ -6,7 +6,7 @@ void abort (void);
void parloop (int N)
{
int i, j,ii;
- int x[400][10][400];
+ int x[401][10][401];
for (ii = 0; ii < N; ii++)
for (i = 0; i < N; i++)
Index: testsuite/gcc.dg/tree-ssa/ldist-pr45948.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ldist-pr45948.c (revision 169127)
+++ testsuite/gcc.dg/tree-ssa/ldist-pr45948.c (working copy)
@@ -18,6 +18,6 @@ foo (int i, int n)
/* We should apply loop distribution and generate 2 memset (0). */
-/* { dg-final { scan-tree-dump "distributed: split to 3" "ldist" } } */
+/* { dg-final { scan-tree-dump "distributed: split to 2" "ldist" } } */
/* { dg-final { scan-tree-dump-times "__builtin_memset" 4 "ldist" } } */
/* { dg-final { cleanup-tree-dump "ldist" } } */
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 169127)
+++ doc/invoke.texi (working copy)
@@ -8489,7 +8489,7 @@ Select fraction of the maximal count of
given basic block needs to have to be considered hot.
@item hot-bb-frequency-fraction
-Select fraction of the maximal frequency of executions of basic block in
+Select fraction of the entry block frequency of executions of basic block in
function given basic block needs to have to be considered hot
@item max-predicted-iterations