This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [sh PATCH] PR/27717, sh backend lies to reload about index registers

From: Joern RENNECKE <joern dot rennecke at st dot com>
To: Paolo Bonzini <paolo dot bonzini at lu dot unisi dot ch>
Cc: Ian Lance Taylor <iant at google dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Bernd Schmidt <bernds_cb1 at t-online dot de>
Date: Mon, 21 Aug 2006 16:10:07 +0100
Subject: Re: [sh PATCH] PR/27717, sh backend lies to reload about index registers
References: <44D7087A.2080102@lu.unisi.ch> <m3sljtmr12.fsf@localhost.localdomain> <44E70104.7090109@lu.unisi.ch>

Paolo Bonzini wrote:

Whether sh really needs that is beyond my understanding. The more I read the patch, the more I hope it doesn't. For example, another way to achieve the same would be to emit the memory access as an UNSPEC. I would hope that we perform enough tree optimizations, that it is not possible to optimize further in the RTL path something like *(div_table + (divisor >> 58)).

Tree optimizations are mostly irrelevant here. At the tree level, division is expected to potentially trap. The SHMEDIA backend expands this is multiple individual machine instructions, which are subjected to the rtl optimizations of cse, loop invariant code motion (licm) and scheduling. In particular, if the divisor is invariant, the entire reciprocal computation can be hoisted/commoned bi licm/cse. When one of the the division strategies inv:minlat, inv:call and inv:fp is selected, a combiner pattern rearranges divisions that have not been taken apart by cse/licm for maximum throughput (inv:minlat) or rematerializes the division operation as a call (inv:call) or floating point operations (inv:fp).

In the (not very likley) case that there are multiple different divisors which are still very similar (in particular, have the same five most significant bits) in a way visible to gcc, it is also possible that some of the address arithmetic and possibly also table lookups can be shared for these different divisors.

That being said, I see no reason why your patch would prevent these optimizations.

Could you please do a quick sanity check?

This compiled at -O2 (add -fverbose-asm to get labels for the branches):

void
f(int i, int *a, int *b, int c)
{
 while (i--)
   a[i] = b[i] / c;
}

should only have two multiplies, a load, a store and some eight shifts/additions/subtractions inside the loop.

This should use exactly nine muls instructions, and exactly one ldx.ub and one ldx.w:

__complex__ int
f (__complex__ int c, int d)
{
 return c/d;
}

When you compile this testcase with -mdiv=inv:fp -O2, no table loads should be left:

int
f (int a, int b)
{
 return a/b;
}

Follow-Ups:
- Re: [sh PATCH] PR/27717, sh backend lies to reload about index registers
  - From: Paolo Bonzini

References:
- [sh PATCH] PR/27717, sh backend lies to reload about index registers
  - From: Paolo Bonzini
- Re: [sh PATCH] PR/27717, sh backend lies to reload about index registers
  - From: Ian Lance Taylor
- Re: [sh PATCH] PR/27717, sh backend lies to reload about index registers
  - From: Paolo Bonzini

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]