This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: Another look at the ARM division routine
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: mark at codesourcery dot com
- Cc: Nicolas Pitre <nico at cam dot org>, Ian Lance Taylor <ian at wasabisystems dot com>, gcc-patches at gcc dot gnu dot org, Richard dot Earnshaw at arm dot com
- Date: Wed, 12 Nov 2003 19:55:32 +0000
- Subject: Re: Another look at the ARM division routine
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
> On Tue, 2003-11-11 at 13:09, Nicolas Pitre wrote:
> > On 11 Nov 2003, Ian Lance Taylor wrote:
> >
> > > Nicolas's code tests every four bits for a zero dividend, and then
> > > loops. The test adds one instruction, and the loop adds three
> > > instructions. Is it better to add four instructions for each four
> > > bits, with the chance of leaving the loop, or is it better to simply
> > > unroll the loop completely as Steve's code does?
> >
> > Actually I just reused the same loop that was there before. I mainly
> > optimized the code surounding that loop which is now pretty optimal, but the
> > loop itself isn't that impressive.
> >
> > > Another way to ask
> > > the question is: how frequently does the divisor end with four or more
> > > zero bits?
> >
> > Right. And that might not be as frequent as I thought.
>
> I suspect that the cases where the divisor ends with four zero bits are
> largely constant power-of-two cases, which should be implemented as
> shifts anyhow.
>
> Given Ian's measurements, I'd say we should go with Ian's patch, and you
> seem to occur.
>
> Ian, this patch is not appropriate for stage 3, but would you please
> apply it to the csl-arm-branch? (CodeSourcery will merge that branch
> into GCC 3.5.)
>
My only concern with this patch is that it is substantially larger than
what we had before. That's not too bad if you are calling the function
often, but can make it slower if you are only doing the occasional
division, since there's more code to pull into the cache. It makes me
wonder whether we should have a size-based version as well...
R.