Another look at the ARM division routine

Richard Earnshaw rearnsha@arm.com
Wed Nov 12 20:37:00 GMT 2003


> > My only concern with this patch is that it is substantially larger than 
> > what we had before.   That's not too bad if you are calling the function 
> > often, but can make it slower if you are only doing the occasional 
> > division, since there's more code to pull into the cache.  It makes me 
> > wonder whether we should have a size-based version as well...
> 
> I'm not sure how to get a handle on that.  
> 
> The only data we have seems to favor Ian's version.
> 
> If we're only doing division rarely, then the fact that it's a little
> slower shouldn't matter much, right?
> 

There is a trick you can play, but I'm not sure if GCC's build system will 
support it.  It relies on constructing your archive files quite carefully.

Basically, you build both versions of __divsi3, one small and one fast.  
In the small one you make the definition of __divsi3 weak.  In the fast 
one you make it strong, and also add a symbol definition, say 
__fast_divsi3.

Now, you arrange the archive file so that normally it will find the small 
__divsi3 definition (put it first in the archive).  However, if you 
compile a file with a time-based optimization you add a NULL relocation 
(ie a reference) to __fast_divsi3 which forces the linker to pull in the 
faster version as well.  The linker will now chose the fast implementation 
in preference to the small one since the definition in that file is strong 
and the small one was weak.

R.



More information about the Gcc-patches mailing list