This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [sh PATCH] PR/27717, sh backend lies to reload about index registers


Paolo Bonzini wrote:

Whether sh really needs that is beyond my understanding. The more I read the patch, the more I hope it doesn't. For example, another way to achieve the same would be to emit the memory access as an UNSPEC. I would hope that we perform enough tree optimizations, that it is not possible to optimize further in the RTL path something like *(div_table + (divisor >> 58)).

Tree optimizations are mostly irrelevant here. At the tree level, division is expected to potentially trap. The SHMEDIA backend expands this is multiple individual machine
instructions, which are subjected to the rtl optimizations of cse, loop invariant code motion (licm) and scheduling. In particular, if the divisor is invariant, the entire reciprocal
computation can be hoisted/commoned bi licm/cse.
When one of the the division strategies inv:minlat, inv:call and inv:fp is selected, a combiner pattern rearranges divisions that have not been taken apart by cse/licm for maximum throughput (inv:minlat) or rematerializes the division operation as a call (inv:call) or floating point operations (inv:fp).


In the (not very likley) case that there are multiple different divisors which are still very similar (in particular, have the same five most significant bits) in a way visible to gcc, it is also possible that some of the address arithmetic and possibly also table lookups can be shared for these different divisors.

That being said, I see no reason why your patch would prevent these optimizations.

Could you please do a quick sanity check?

This compiled at -O2 (add -fverbose-asm to get labels for the branches):

void
f(int i, int *a, int *b, int c)
{
 while (i--)
   a[i] = b[i] / c;
}

should only have two multiplies, a load, a store and some eight shifts/additions/subtractions inside the loop.


This should use exactly nine muls instructions, and exactly one ldx.ub and one ldx.w:


__complex__ int
f (__complex__ int c, int d)
{
 return c/d;
}


When you compile this testcase with -mdiv=inv:fp -O2, no table loads should be left:


int
f (int a, int b)
{
 return a/b;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]