This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: ARM contribution: Longlong division 10x times faster!


On Mon, 2005-11-21 at 05:18, Daniel Jacobowitz wrote:
> On Sun, Nov 20, 2005 at 10:33:24PM +0100, Fredrik Hederstierna wrote:
> > Hi,
> > 
> > Today Fredrik Hederstierna of Purple Scout AB announces the release
> > of a hand-coded assembly implementation of 64bit division and modulo
> > for ARM.
> > 
> > The code is in average 2x-3x faster than the current libgcc-version,
> > and at best it could be more than 10x faster (tested for ARMv4). It's
> > also smaller in size, each function (4 in total) is about 150-250
> > bytes shorter. In total this could shrink the code size by 1k.
> 
> First, thanks for doing this.
> 
> In order to merge a substantial contribution into GCC, we need a
> copyright assignment to the FSF.  It doesn't look like you have one; is
> that right?  If so, let me know off-list and I'll send you the form.

Let me echo Daniel's comments and thanks.

I'll let Daniel guide you through the paperwork issues, but meanwhile
some quick comments on the code:

> #if defined(ARM_USE_CLZ_TABLE)
> 	.globl  __arm_clz_tab
> __arm_clz_tab:
> 	.byte 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
> 	.byte 6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6

This table already exists in libgcc.  There's no need for a second copy,
especially given it's size.

> #if defined(ARM_USE_CLZ_TABLE)
> 	ldr	\tmp, =__arm_clz_tab

If you use ldr =, you should have an explicit .pool directive to force
out the constant pool literals.  GAS will put one in automatically, but
it's not always at the optimal location.  An explicit pool directive is
therefore good practice and even if it ends up in the same place as the
default, it shows you haven't forgotten :-)

> /* This file is only compilable to a processor with long multiply instructions.
>    The ARM_ARCH define equals 4 even for ARMv3M which also has long multiply.*/
> #if (__ARM_ARCH__ > 3)

Hmm, this is the big problem.  Most configurations of gcc for ARM permit
a '--with-cpu=' that can have any permitted cpu name.  If we put this
code in, then we can't use it on any configuration where the user can
legitimately use a cpu name that predates ARMv3M, even if the default
configuration would normally be for an acceptable core.  This rules out
it's use on arm-linux, arm-elf, and a number of other significant
configurations (there's no way that I know of to make the makefile
configury automatically select this file rather than the C version when
the cpu option changes).

How hard would it be to macroize the long mul operations so that we can
drop in compatibility code (I wouldn't worry excessively about
efficiency on those targets)?

R.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]