This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH][expmed] Properly account for the cost and latency of shift+add ops when synthesizing mults
- From: Jeff Law <law at redhat dot com>
- To: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>
- Cc: "rth at redhat dot com" <rth at redhat dot com>
- Date: Mon, 20 Apr 2015 09:06:37 -0600
- Subject: Re: [PATCH][expmed] Properly account for the cost and latency of shift+add ops when synthesizing mults
- Authentication-results: sourceware.org; auth=none
- References: <552CCBCF dot 2060502 at arm dot com> <5531611F dot 7060505 at redhat dot com> <5534DE71 dot 6060904 at arm dot com>
On 04/20/2015 05:09 AM, Kyrill Tkachov wrote:
Yea, when I looked at the differences, it wasn't immediately clear if
there was a real improvement or not. The new sequences use the single
register, so one might claim they're marginally better due to that.
On 17/04/15 20:38, Jeff Law wrote:
On 04/14/2015 02:11 AM, Kyrill Tkachov wrote:
Of course the effect on codegen of this patch depends a lot on the rtx
costs in the backend.
On aarch64 with -mcpu=cortex-a57 tuning I see the cost limit being
exceeded in more cases and the
expansion code choosing instead to do a move-immediate and a mul
No regressions on SPEC2006 on a Cortex-A57.
For example, for code:
long f0 (int x, int y)
return (long)x * 6L;
int f1(int x)
return x * 10;
int f2(int x)
return x * 100;
int f3(int x)
return x * 20;
int f4(int x)
return x * 25;
int f5(int x)
return x * 11;
Please turn this into a test for the testsuite. It's fine if this the
test is specific to AArch64. You may need to break it into 6 individual
tests since what you want to check for in each one may be significantly
different. For example, f0, f4 and f5 you'd probably check for the
constant load & multiply instructions. Not sure how to best test for
what you want in f1-f3.
f1/f3 still end up synthesising the mult, but prefer a different
algorithm. I don't think the algorithm chosen in f1/f3 is worse or
better than what it was producing before, so I don't think there's
much point in testing for it.
If you think it's really better to
Your call on f1/f3. IIRC f2 didn't change at all, so I'm not sure if
you need a test for that (perhaps you should make sure it continues to
use a mul rather than a synthesized sequence).
test for something, I propose testing that only two instructions are
generated, and neither of them are a 'mul'. I'll repost a patch with
my proposed testcases for f0,f2,f4,f5.
Pre-approved with whatever you decide on the testing side.