This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
ARM modulo patch
- To: gcc-patches at gcc dot gnu dot org
- Subject: ARM modulo patch
- From: Nick Clifton <nickc at redhat dot com>
- Date: Mon, 21 Aug 2000 12:58:59 -0700
Hi Guys,
I am withdrawing my earlier patch to fix the ARM port's calculation
of the modulos of big dividends by small divisors. Further testing
revealed that the patch broke modulos of small numbers (eg it
computed "3 % 2" as 0).
I believe that the patch below is the correct solution. It
certainly works for all of the test cases that I have tried so far
(including big and small dividends). It also does not add an extra
instruction into the body of the main loop, so I hope that it will
not affect the performance of the code.
Note - the same bug exists in both the Thumb and ARM versions of the
modulo routines, so the patch fixes both sets of functions.
Any objections to my applying this patch ?
Cheers
Nick
2000-08-21 Nick Clifton <nickc@redhat.com>
* config/arm/lib1funcs.asm (__umodsi3): Before performing any
restorative additions, test for bottom bits of IP being set,
rather than relying upon the RORs not matching.
(__modsi3): Ditto.
Index: gcc/config/arm/lib1funcs.asm
===================================================================
RCS file: /cvs/gcc/egcs/gcc/config/arm/lib1funcs.asm,v
retrieving revision 1.12
diff -w -p -r1.12 lib1funcs.asm
*** lib1funcs.asm 2000/08/18 10:18:14 1.12
--- lib1funcs.asm 2000/08/21 19:57:03
*************** Over6:
*** 400,415 ****
@ Any subtractions that we should not have done will be recorded in
@ the top three bits of "overdone". Exactly which were not needed
@ are governed by the position of the bit, stored in ip.
- @ If we terminated early, because dividend became zero,
- @ then none of the below will match, since the bit in ip will not be
- @ in the bottom nibble.
-
mov work, #0xe
lsl work, #28
and overdone, work
bne Over7
pop { work }
RET @ No fixups needed
Over7:
mov curbit, ip
mov work, #3
--- 400,423 ----
@ Any subtractions that we should not have done will be recorded in
@ the top three bits of "overdone". Exactly which were not needed
@ are governed by the position of the bit, stored in ip.
mov work, #0xe
lsl work, #28
and overdone, work
bne Over7
pop { work }
RET @ No fixups needed
+
+ @ If we terminated early, because dividend became zero, then the
+ @ bit in ip will not be in the bottom nibble, and we should not
+ @ perform the additions below. We must test for this though
+ @ (rather relying upon the TSTs to prevent the additions) since
+ @ the bit in ip could be in the top two bits which might then match
+ @ with one of the smaller RORs.
+ mov curbit, ip
+ mov work, #0x7
+ tst curbit, work
+ beq Over10
+
Over7:
mov curbit, ip
mov work, #3
*************** Loop3:
*** 490,499 ****
@ Any subtractions that we should not have done will be recorded in
@ the top three bits of "overdone". Exactly which were not needed
@ are governed by the position of the bit, stored in ip.
- @ If we terminated early, because dividend became zero,
- @ then none of the below will match, since the bit in ip will not be
- @ in the bottom nibble.
ands overdone, overdone, #0xe0000000
RETc(eq) @ No fixups needed
tst overdone, ip, ror #3
addne dividend, dividend, divisor, lsr #3
--- 498,511 ----
@ Any subtractions that we should not have done will be recorded in
@ the top three bits of "overdone". Exactly which were not needed
@ are governed by the position of the bit, stored in ip.
ands overdone, overdone, #0xe0000000
+ @ If we terminated early, because dividend became zero, then the
+ @ bit in ip will not be in the bottom nibble, and we should not
+ @ perform the additions below. We must test for this though
+ @ (rather relying upon the TSTs to prevent the additions) since
+ @ the bit in ip could be in the top two bits which might then match
+ @ with one of the smaller RORs.
+ tstNE ip, #0x7
RETc(eq) @ No fixups needed
tst overdone, ip, ror #3
addne dividend, dividend, divisor, lsr #3
*************** Over7:
*** 797,810 ****
@ Any subtractions that we should not have done will be recorded in
@ the top three bits of "overdone". Exactly which were not needed
@ are governed by the position of the bit, stored in ip.
- @ If we terminated early, because dividend became zero,
- @ then none of the below will match, since the bit in ip will not be
- @ in the bottom nibble.
mov work, #0xe
lsl work, #28
and overdone, work
beq Lgot_result
mov curbit, ip
mov work, #3
ror curbit, work
--- 809,830 ----
@ Any subtractions that we should not have done will be recorded in
@ the top three bits of "overdone". Exactly which were not needed
@ are governed by the position of the bit, stored in ip.
mov work, #0xe
lsl work, #28
and overdone, work
beq Lgot_result
+ @ If we terminated early, because dividend became zero, then the
+ @ bit in ip will not be in the bottom nibble, and we should not
+ @ perform the additions below. We must test for this though
+ @ (rather relying upon the TSTs to prevent the additions) since
+ @ the bit in ip could be in the top two bits which might then match
+ @ with one of the smaller RORs.
+ mov curbit, ip
+ mov work, #0x7
+ tst curbit, work
+ beq Lgot_result
+
mov curbit, ip
mov work, #3
ror curbit, work
*************** Loop3:
*** 896,905 ****
@ Any subtractions that we should not have done will be recorded in
@ the top three bits of "overdone". Exactly which were not needed
@ are governed by the position of the bit, stored in ip.
- @ If we terminated early, because dividend became zero,
- @ then none of the below will match, since the bit in ip will not be
- @ in the bottom nibble.
ands overdone, overdone, #0xe0000000
beq Lgot_result
tst overdone, ip, ror #3
addne dividend, dividend, divisor, lsr #3
--- 916,929 ----
@ Any subtractions that we should not have done will be recorded in
@ the top three bits of "overdone". Exactly which were not needed
@ are governed by the position of the bit, stored in ip.
ands overdone, overdone, #0xe0000000
+ @ If we terminated early, because dividend became zero, then the
+ @ bit in ip will not be in the bottom nibble, and we should not
+ @ perform the additions below. We must test for this though
+ @ (rather relying upon the TSTs to prevent the additions) since
+ @ the bit in ip could be in the top two bits which might then match
+ @ with one of the smaller RORs.
+ tstNE ip, #0x7
beq Lgot_result
tst overdone, ip, ror #3
addne dividend, dividend, divisor, lsr #3