This is the mail archive of the
mailing list for the GCC project.
Re: Another look at the ARM division routine
- From: Richard Earnshaw <rearnsha at arm dot com>
- To: mark at codesourcery dot com
- Cc: Richard dot Earnshaw at arm dot com, Nicolas Pitre <nico at cam dot org>, Ian Lance Taylor <ian at wasabisystems dot com>, gcc-patches at gcc dot gnu dot org
- Date: Wed, 12 Nov 2003 20:34:12 +0000
- Subject: Re: Another look at the ARM division routine
- Organization: ARM Ltd.
- Reply-to: Richard dot Earnshaw at arm dot com
> > My only concern with this patch is that it is substantially larger than
> > what we had before. That's not too bad if you are calling the function
> > often, but can make it slower if you are only doing the occasional
> > division, since there's more code to pull into the cache. It makes me
> > wonder whether we should have a size-based version as well...
> I'm not sure how to get a handle on that.
> The only data we have seems to favor Ian's version.
> If we're only doing division rarely, then the fact that it's a little
> slower shouldn't matter much, right?
There is a trick you can play, but I'm not sure if GCC's build system will
support it. It relies on constructing your archive files quite carefully.
Basically, you build both versions of __divsi3, one small and one fast.
In the small one you make the definition of __divsi3 weak. In the fast
one you make it strong, and also add a symbol definition, say
Now, you arrange the archive file so that normally it will find the small
__divsi3 definition (put it first in the archive). However, if you
compile a file with a time-based optimization you add a NULL relocation
(ie a reference) to __fast_divsi3 which forces the linker to pull in the
faster version as well. The linker will now chose the fast implementation
in preference to the small one since the definition in that file is strong
and the small one was weak.