This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [sh PATCH] PR/27717, sh backend lies to reload about index registers
- From: Joern RENNECKE <joern dot rennecke at st dot com>
- To: Paolo Bonzini <paolo dot bonzini at lu dot unisi dot ch>
- Cc: Ian Lance Taylor <iant at google dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Bernd Schmidt <bernds_cb1 at t-online dot de>
- Date: Mon, 21 Aug 2006 16:10:07 +0100
- Subject: Re: [sh PATCH] PR/27717, sh backend lies to reload about index registers
- References: <44D7087A.2080102@lu.unisi.ch> <m3sljtmr12.fsf@localhost.localdomain> <44E70104.7090109@lu.unisi.ch>
Paolo Bonzini wrote:
Whether sh really needs that is beyond my understanding. The more I
read the patch, the more I hope it doesn't. For example, another way
to achieve the same would be to emit the memory access as an UNSPEC.
I would hope that we perform enough tree optimizations, that it is not
possible to optimize further in the RTL path something like
*(div_table + (divisor >> 58)).
Tree optimizations are mostly irrelevant here. At the tree level,
division is expected to potentially trap. The SHMEDIA backend expands
this is multiple individual machine
instructions, which are subjected to the rtl optimizations of cse, loop
invariant code motion (licm) and scheduling. In particular, if the
divisor is invariant, the entire reciprocal
computation can be hoisted/commoned bi licm/cse.
When one of the the division strategies inv:minlat, inv:call and inv:fp
is selected, a combiner pattern rearranges divisions that have not been
taken apart by cse/licm for maximum throughput (inv:minlat) or
rematerializes the division operation as a call (inv:call) or floating
point operations (inv:fp).
In the (not very likley) case that there are multiple different divisors
which are still very similar (in particular, have the same five most
significant bits) in a way visible to gcc, it is also possible that some
of the address arithmetic and possibly also table lookups can be shared
for these different divisors.
That being said, I see no reason why your patch would prevent these
optimizations.
Could you please do a quick sanity check?
This compiled at -O2 (add -fverbose-asm to get labels for the branches):
void
f(int i, int *a, int *b, int c)
{
while (i--)
a[i] = b[i] / c;
}
should only have two multiplies, a load, a store and some eight
shifts/additions/subtractions inside the loop.
This should use exactly nine muls instructions, and exactly one ldx.ub
and one ldx.w:
__complex__ int
f (__complex__ int c, int d)
{
return c/d;
}
When you compile this testcase with -mdiv=inv:fp -O2, no table loads
should be left:
int
f (int a, int b)
{
return a/b;
}