[sh PATCH] PR/27717, sh backend lies to reload about index registers

Joern RENNECKE joern.rennecke@st.com
Mon Aug 21 16:03:00 GMT 2006


Paolo Bonzini wrote:

>   Whether sh really needs that is beyond my understanding.  The more I 
> read the patch, the more I hope it doesn't.  For example, another way 
> to achieve the same would be to emit the memory access as an UNSPEC.  
> I would hope that we perform enough tree optimizations, that it is not 
> possible to optimize further in the RTL path something like 
> *(div_table + (divisor >> 58)).

Tree optimizations are mostly irrelevant here.  At the tree level, 
division is expected to potentially trap.  The SHMEDIA backend expands 
this is multiple individual  machine
instructions, which are subjected to the rtl optimizations of cse, loop 
invariant code motion (licm) and scheduling.  In particular, if the 
divisor is invariant, the entire reciprocal
computation can be hoisted/commoned bi licm/cse.
When one of the the division strategies inv:minlat, inv:call and inv:fp 
is selected, a combiner pattern rearranges divisions that have not been 
taken apart by cse/licm for maximum throughput (inv:minlat) or 
rematerializes the division operation as a call (inv:call) or floating 
point operations (inv:fp).

In the (not very likley) case that there are multiple different divisors 
which are still very similar (in particular, have the same five most 
significant bits) in a way visible to gcc, it is also possible that some 
of the address arithmetic and possibly also table lookups can be shared 
for these different divisors.

That being said, I see no reason why your patch would prevent these 
optimizations.

Could you please do a quick sanity check?

This compiled at -O2 (add -fverbose-asm to get labels for the branches):

void
f(int i, int *a, int *b, int c)
{
  while (i--)
    a[i] = b[i] / c;
}

should only have two multiplies, a load, a store and some eight 
shifts/additions/subtractions inside the loop.


This should use exactly nine muls instructions, and exactly one ldx.ub 
and one ldx.w:
 
__complex__ int
f (__complex__ int c, int d)
{
  return c/d;
}


When you compile this testcase with -mdiv=inv:fp -O2, no table loads 
should be left:

int
f (int a, int b)
{
  return a/b;
}



More information about the Gcc-patches mailing list