Another look at the ARM division routine

Wed Nov 12 20:11:00 GMT 2003

Richard Earnshaw <rearnsha@arm.com> writes:

> My only concern with this patch is that it is substantially larger than 
> what we had before.   That's not too bad if you are calling the function 
> often, but can make it slower if you are only doing the occasional 
> division, since there's more code to pull into the cache.  It makes me 
> wonder whether we should have a size-based version as well...

A sized based version is easy to add--in fact, Steve has one, though I
dropped it.  The question is how to make it available for users.  If
we multilib based on -Os, it would be easy to select the right
version.  But as far as I know, we don't do that, and we don't want
to.

Note that although the code is larger than what was there before, it
branches to the right iteration.  So while it is true that cases like
0x7fffffff / 3 will pull more instructions into the cache, there are
other cases which will actually pull fewer instructions into the
cache, because they will branch right to the end of the function.

FYI, Steve's original code looks like this:

#ifndef	__OPTIMIZE_SIZE__
	rsbs	r3, r3, #31
	addne	r3, r3, r3, lsl #1
	mov	r0, #0
	addne	pc, pc, r3, lsl #2
	nop
	.set	shift, 32
	.rept	32
	.set	shift, shift - 1
	cmp	r2, divisor, lsl #shift
	adc	r0, r0, r0
	subcs	r2, r2, divisor, lsl #shift
	.endr
#else
	mov	r0, #0
Loop:
	cmp	r2, divisor, lsl r3
	adc	r0, r0, r0
	subcs	r2, r2, divisor, lsl r3
	subs	r3, r3, #1
	bpl	Loop
#endif

I dropped this, because there is no reason to ever expect
__OPTIMIZE_SIZE__ to be defined when lib1funcs.asm is assembled.  But
I can easily add it back if you think it would be appropriate.

Ian