This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

ARM contribution: Longlong division 10x times faster!


Hi,

Today Fredrik Hederstierna of Purple Scout AB announces the release of a hand-coded assembly implementation of 64bit division and modulo for ARM.

The code is in average 2x-3x faster than the current libgcc-version, and at best it could be more than 10x faster (tested for ARMv4).
It's also smaller in size, each function (4 in total) is about 150-250 bytes shorter. In total this could shrink the code size by 1k.

A new numeric algorithm Quick QRNND (QQRNND) is implemented, that often exits faster than the original shift-and-sub algorithms commonly used in division.
The QQRNND tries to approximate the quotient by repeatedly multiplying the denominator and the highword of the numerator.
Maybe this new approach is interesting also for other architectures than ARM.

The code is located in the file "longlong.S". Also some other 64bit arithmetics from "lib1funcs.asm" could also be moved to here, like 64bit shift operations.
To integrate it with GCC you'll need to add '#include "longlong.S"' to the include-list last in your "lib1funcs.asm". Also you'll need to enter the function names in the "t-arm-elf" file. (A simple patch is submitted with this mail.) The file also contains some extra toolbox macros for 64bit arithmetics that could be useful for anyone who like to develop this code further.

I originally started to work on this piece of code more than 2 years ago, but never had the time to finish it.
I spent a couple of late nights with the code lately to make it ready for the GCC repository, and I'm very satisfied that this little project hopefully is closed at last!

The code supports ARMv3M and higher, I've mainly tested it on ARMv4 (ARM7TDMI), please report if there are any problems with ARMv5 or higher versions. I think the gain will increase with e.g. ARM9E, where the 64bit multiply intructions take 3 cycles only.

Hope this piece of code can be useful for you!

Kind Regards,

Fredrik Hederstierna
Purple Scout AB
- Embedded Quality -
- Open Source - Eclipse -
http://www.purplescout.se



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]