Another look at the ARM division routine
Richard Earnshaw
rearnsha@arm.com
Wed Nov 12 20:37:00 GMT 2003
> > My only concern with this patch is that it is substantially larger than
> > what we had before. That's not too bad if you are calling the function
> > often, but can make it slower if you are only doing the occasional
> > division, since there's more code to pull into the cache. It makes me
> > wonder whether we should have a size-based version as well...
>
> I'm not sure how to get a handle on that.
>
> The only data we have seems to favor Ian's version.
>
> If we're only doing division rarely, then the fact that it's a little
> slower shouldn't matter much, right?
>
There is a trick you can play, but I'm not sure if GCC's build system will
support it. It relies on constructing your archive files quite carefully.
Basically, you build both versions of __divsi3, one small and one fast.
In the small one you make the definition of __divsi3 weak. In the fast
one you make it strong, and also add a symbol definition, say
__fast_divsi3.
Now, you arrange the archive file so that normally it will find the small
__divsi3 definition (put it first in the archive). However, if you
compile a file with a time-based optimization you add a NULL relocation
(ie a reference) to __fast_divsi3 which forces the linker to pull in the
faster version as well. The linker will now chose the fast implementation
in preference to the small one since the definition in that file is strong
and the small one was weak.
R.
More information about the Gcc-patches
mailing list