Another look at the ARM division routine
Ian Lance Taylor
ian@wasabisystems.com
Wed Nov 12 20:11:00 GMT 2003
Richard Earnshaw <rearnsha@arm.com> writes:
> My only concern with this patch is that it is substantially larger than
> what we had before. That's not too bad if you are calling the function
> often, but can make it slower if you are only doing the occasional
> division, since there's more code to pull into the cache. It makes me
> wonder whether we should have a size-based version as well...
A sized based version is easy to add--in fact, Steve has one, though I
dropped it. The question is how to make it available for users. If
we multilib based on -Os, it would be easy to select the right
version. But as far as I know, we don't do that, and we don't want
to.
Note that although the code is larger than what was there before, it
branches to the right iteration. So while it is true that cases like
0x7fffffff / 3 will pull more instructions into the cache, there are
other cases which will actually pull fewer instructions into the
cache, because they will branch right to the end of the function.
FYI, Steve's original code looks like this:
#ifndef __OPTIMIZE_SIZE__
rsbs r3, r3, #31
addne r3, r3, r3, lsl #1
mov r0, #0
addne pc, pc, r3, lsl #2
nop
.set shift, 32
.rept 32
.set shift, shift - 1
cmp r2, divisor, lsl #shift
adc r0, r0, r0
subcs r2, r2, divisor, lsl #shift
.endr
#else
mov r0, #0
Loop:
cmp r2, divisor, lsl r3
adc r0, r0, r0
subcs r2, r2, divisor, lsl r3
subs r3, r3, #1
bpl Loop
#endif
I dropped this, because there is no reason to ever expect
__OPTIMIZE_SIZE__ to be defined when lib1funcs.asm is assembled. But
I can easily add it back if you think it would be appropriate.
Ian
More information about the Gcc-patches
mailing list