This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][ARM] optimizing _muldi3 for Thumb
- From: "=?big5?b?RG91ZyBLd2FuICjD9q62vHcp?=" <dougkwan at google dot com>
- To: "Paul Brook" <paul at codesourcery dot com>
- Cc: gcc-patches at gcc dot gnu dot org, "Mark Mitchell" <mark at codesourcery dot com>
- Date: Mon, 4 Aug 2008 15:37:58 -0700
- Subject: Re: [PATCH][ARM] optimizing _muldi3 for Thumb
- References: <498552560807011502w6dd3dd62q3bd7b1cf08102387@mail.gmail.com> <488E8D3C.5020000@codesourcery.com> <498552560808012307j236465f6k4aac9764f1955d66@mail.gmail.com> <200808042254.48200.paul@codesourcery.com>
Hi
2008/8/4 Paul Brook <paul@codesourcery.com>:
> On Saturday 02 August 2008, Doug Kwan (Ãö®¶¼w) wrote:
>> Here is an updated patch.
>
> This is bad in several ways.
Bad feedback is better than no feedback. :)
> The condition for pure Thumb-1 code is completely wrong, you want
> __ARM_ARCH_6M__, exactly the same as all the other Thumb-1 only code.
> For most purposes ARMv6-M is Thumb-1. ARM marketing sometimes call
> it "Thumb-2" to deliberately confuse people. Please don't do this.
Question: If someone configures gcc with --with-arch=armv6, the libgcc
will be built with __ARM_ARCH_6__. If he/she then uses the said gcc
to compile something with -mthumb. Is the resulting binary expected to
work on a Cortex-M1? That the reason why I test of both
__ARM_ARCH_6__ and __ARM_ARCH_6M__.
>> +/* We cannot use the faster ARM version for THUMB libgcc on V6 and V6M
>> since Cortex-M1 does not run ARM code. */
>
> Shows you've completely misunderstood which architecture variants have which
> features.
>
> umull is only available on v3M and later cores.
Will add another ARM version with umull then.
> The Thumb-2 code is just dumb (Yes I know it's what the compiler generates,
> but gcc is notoriously bad at doubleword arithmetic). mla is not Thumb-2
> specific, and using it certainly doesn't require additional register pushes.
> AFAICS there's no reason to have different code for ARM and Thumb-2.
From a scheulding point of view, an MUL->MLA chain is bad. That's the
reason why I use two independent MUL in the ARM version. I cannot
use the ARM version for Thumb-2 as well since it accesses ip in the
MUL instruction.
I don't care too much about Thumb-2 since gcc currently generate
in-line 64-bit multiplication anyway so I just take what gcc generates
for the Thumb-2. If someone really wants to call __aeabi_lmul instead,
my dumb code is still better than the horrible thing based on the C
version. I will remove the push and pop though.
-Doug