[PATCH][ARM] optimizing _muldi3 for Thumb

Tue Jul 29 08:43:00 GMT 2008

I looked at the ARM v5 ISA manual.  There is only one multiplication
instruction for THUMB, which produces the lower 32-bit result of a
32-bit by 32-bit multiplication.  For 64-bit by 64-bit multiplication,
there seems to be no efficient way with the THUMB ISA.  We can use the
mul instruction and a bunch of shifts and adds to find out the higher
32-bit of the result of a 32-bit by 32-bit multiplication but that is
quite painful.

The Thumb-2 ISA is a different story. gcc already generates an
instruction sequence for cortex-m3.  For cortex-m1, however, gcc still
generates a call to __muldi3.

So the current situation is:

- THUMB and THUMB-2 on cortex-m1 cannot do 64-bit multiplication easily.
- 64-bit multiplication can be done very efficiently on ARM but
cortex-m1 does not support ARM mode.
- On ARM and THUMB-2 (except cortex-m1) gcc already 64-bit generates
multiplication code in-line.

So I think we do need to change the code generator as it is as good as
it can be for 64-bit multiplication.  libgcc can still be optimized
though we have to be careful about the target.  How about making the
optimization version selectable using appropriate __ARM_ARCH_xxx_? In
other words, we will use a slow 64-bit multiplication for unknown
THUMB target and the fast ARM version if we know the target also
support ARM code.  The code will be enabled with the appropriate
--with-arch= when gcc is configure.

We can improve 32-bit multiplication on THUMB though, there is no need
to call libgcc.

-Doug

2008/7/27 Mark Mitchell <mark@codesourcery.com>:
> Doug Kwan (關振德) wrote:
>
>> Okay. I will try generating instruction sequence directly.  However,
>> we may still want to optimize libgcc because the current generated
>> code is horrible.
>
> Agreed.  For -Os, it will probably be better to make the call, even for
> a 6-instruction sequence, and there's no reason to have the call go to
> big, slow code.  You might look at defining __umulsidi3 for ARM, as it
> looks like lots of other architectures have asm implementations that
> might be a lot more efficient than the default.
>
> --
> Mark Mitchell
> CodeSourcery
> mark@codesourcery.com
> (650) 331-3385 x713
>