This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
RFC patch: invariant addresses too cheap
- From: Michael Matz <matz at suse dot de>
- To: gcc-patches at gcc dot gnu dot org
- Cc: bonzini at gnu dot org
- Date: Mon, 19 Oct 2009 18:09:24 +0200 (CEST)
- Subject: RFC patch: invariant addresses too cheap
Hi,
the patch from http://gcc.gnu.org/ml/gcc-patches/2009-05/msg00341.html
(one part of fixing PR33928) introduced a 10% regression in 410.bwaves
(http://gcc.opensuse.org/SPEC/CFP/sb-barbella.suse.de-head-64-2006/index.html
the drop in May)
The reason is that some invariants aren't moved out of a non-inner loop
in mat_times_vec (the hot function). A C testcase is something like this:
---------------
#define I(c,d) (c*n+d)
int f (long x[], long n, long n2, long n3)
{
int sum = 0;
long k, l, m;
long km1, kp1;
for (k = 0; k < n; k++)
{
km1 = (k + n - 2) % n + 1;
kp1 = (k + 1) % n;
for (l = 0; l < n; l++)
for (m = 0; m < n; m++)
{
sum += x[I(k,m)];
sum += x[I(kp1,m)];
sum += x[I(km1,m)];
}
}
return sum;
}
----------------
(compile with -O3 -ffast-math -funroll-loops -fpeel-loops -fno-tree-vectorize)
The invariants are of the form
(set (reg:DI 193 [ ivtmp.96 ])
(plus:DI (reg/v/f:DI 229 [ x ]) (reg:DI 245)))
They aren't moved because they are considered too cheap (they cost on x86
is 3, which is < COSTS_N_INSN(1)). As they use two registers they better
be moved independend of cost. Looking at the definition and uses of
address_cost in the compiler I doubt that the comparison with COSTS_N_INSN
is correct. Certainly currently the comparison with COSTS_N_INSN is just
a magic number. There are more ports defining TARGET_ADDRESS_COSTS in
terms of simple ordinals numbers than there are defining them relative to
COSTS_N_INSN. The default definition does use COSTS_N_INSN, though.
This inconsistency didn't matter until the patch because we always only
compared these costs with each other, not to determine cheapness in an
absolute sense. The port definitions weren't changed with the patch, so
COSTS_N_INSN(1) is just a fancy way of writing the magic number "4". With
that we can as well write the magic number "3", and get both: the testcase
from the PR still doesn't exhibit moving trivial invariants like
"-16(%r12), -24(%r12), ...", and doesn't regress 410.bwaves either.
Now, that is of course a bit unsatisfying, but there aren't that terribly
many ways how to fix this because we have conflicting requirements:
(a) for relative differences the numbers don't need a unit, and ports are
defined with that in mind. E.g. on x86 reg+ofs has cost 2, while
reg+reg has cost 3, and reg alone has cost 1. All three of them are
usable as address operands without needing additional instructions, so
they are very cheap in one sense, but still they are different.
(b) but with that comparison for absolute cheapness doesn't make sense.
Even if the ports would multiply everything with COSTS_N_INSN(1) we
had the same problem, loop-invariant.c (comparing with
COSTS_N_INSN(1)) still wouldn't move anything.
Really the address_cost target hook is better thought of as determining
relative complexity. For loop-invariant.c we could also try to do
something similar and look at the form of the address to determine if it's
really trivial enough to ignore moving (e.g. doesn't refer to two regs).
We could also change the i386 target hook to return something >
COSTS_N_INSN(1) when two regs are mentioned and something smaller
(0,1,2,3) when not. I.e. "half" insn costs. That doesn't sound too
appealing either, OTOH that may be the reason why COSTS_N_INSN is scaled
at all.
Does anyone have any opinions?
Ciao,
Michael.
--
* loop-invariant.c (create_new_invariant): Use different magic number.
Index: loop-invariant.c
===================================================================
--- loop-invariant.c (revision 147274)
+++ loop-invariant.c (working copy)
@@ -686,7 +686,7 @@ create_new_invariant (struct def *def, r
{
inv->cost = rtx_cost (set, SET, speed);
inv->cheap_address = address_cost (SET_SRC (set), word_mode,
- speed) < COSTS_N_INSNS (1);
+ speed) < 3;
}
else
{