[PATCH][ARM] optimizing _muldi3 for Thumb

Paul Brook paul@codesourcery.com
Tue Aug 5 01:33:00 GMT 2008


> Question: If someone configures gcc with --with-arch=armv6, the libgcc
> will be built with __ARM_ARCH_6__.  If he/she then uses the said gcc
> to compile something with -mthumb. Is the resulting binary expected to
> work on a Cortex-M1?  

No. ARMv6 is not a subset of ARMv6-M. 

> > The Thumb-2 code is just dumb (Yes I know it's what the compiler
> > generates, but gcc is notoriously bad at doubleword arithmetic).  mla is
> > not Thumb-2 specific, and using it certainly doesn't require additional
> > register pushes. AFAICS there's no reason to have different code for ARM
> > and Thumb-2.
>
> From a scheulding point of view, an MUL->MLA chain is bad.
> That's the reason why I use two independent MUL in the ARM version.

Do you have any proof of this? Most cores have a bypass for the accumulate 
operand, and will issue dependent mul/mla instructions back to back.

A mla implementation should be 4 instructions (plus the ret), and doesn't 
require any scratch registers.

> I cannot use the ARM version for Thumb-2 as well since it accesses ip in the
> MUL instruction.

Nonsense. 

> I don't care too much about Thumb-2 since gcc currently generate
> in-line 64-bit multiplication anyway so I just take what gcc generates
> for the Thumb-2. 

Not a good excuse IMHO.  As Mark mentioned, the out of line version can be 
useful for size optimization.

Paul



More information about the Gcc-patches mailing list