[PATCH][ARM] optimizing _muldi3 for Thumb
Paul Brook
paul@codesourcery.com
Tue Aug 5 01:33:00 GMT 2008
> Question: If someone configures gcc with --with-arch=armv6, the libgcc
> will be built with __ARM_ARCH_6__. If he/she then uses the said gcc
> to compile something with -mthumb. Is the resulting binary expected to
> work on a Cortex-M1?
No. ARMv6 is not a subset of ARMv6-M.
> > The Thumb-2 code is just dumb (Yes I know it's what the compiler
> > generates, but gcc is notoriously bad at doubleword arithmetic). mla is
> > not Thumb-2 specific, and using it certainly doesn't require additional
> > register pushes. AFAICS there's no reason to have different code for ARM
> > and Thumb-2.
>
> From a scheulding point of view, an MUL->MLA chain is bad.
> That's the reason why I use two independent MUL in the ARM version.
Do you have any proof of this? Most cores have a bypass for the accumulate
operand, and will issue dependent mul/mla instructions back to back.
A mla implementation should be 4 instructions (plus the ret), and doesn't
require any scratch registers.
> I cannot use the ARM version for Thumb-2 as well since it accesses ip in the
> MUL instruction.
Nonsense.
> I don't care too much about Thumb-2 since gcc currently generate
> in-line 64-bit multiplication anyway so I just take what gcc generates
> for the Thumb-2.
Not a good excuse IMHO. As Mark mentioned, the out of line version can be
useful for size optimization.
Paul
More information about the Gcc-patches
mailing list