This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][expmed] Properly account for the cost and latency of shift+add ops when synthesizing mults


On 04/20/2015 05:09 AM, Kyrill Tkachov wrote:
Hi Jeff,

On 17/04/15 20:38, Jeff Law wrote:
On 04/14/2015 02:11 AM, Kyrill Tkachov wrote:
Of course the effect on codegen of this patch depends a lot on the rtx
costs in the backend.
On aarch64 with -mcpu=cortex-a57 tuning I see the cost limit being
exceeded in more cases and the
expansion code choosing instead to do a move-immediate and a mul
instruction.
No regressions on SPEC2006 on a Cortex-A57.

For example, for code:
long f0 (int x, int y)
{
    return (long)x * 6L;
}


int f1(int x)
{
    return x * 10;
}

int f2(int x)
{
      return x * 100;
}

int f3(int x)
{
      return x * 20;
}

int f4(int x)
{
      return x * 25;
}

int f5(int x)
{
        return x * 11;
}
Please turn this into a test for the testsuite.  It's fine if this the
test is specific to AArch64.  You may need to break it into 6 individual
tests since what you want to check for in each one may be significantly
different.  For example, f0, f4 and f5 you'd probably check for the
constant load & multiply instructions.  Not sure how to best test for
what you want in f1-f3.

f1/f3 still end up synthesising the mult, but prefer a different
algorithm. I don't think the algorithm chosen in f1/f3 is worse or
better than what it was producing before, so I don't think there's
much point in testing for it.
Yea, when I looked at the differences, it wasn't immediately clear if there was a real improvement or not. The new sequences use the single register, so one might claim they're marginally better due to that.

 If you think it's really better to
test for something, I propose testing that only two instructions are
generated, and neither of them are a 'mul'. I'll repost a patch with
my proposed testcases for f0,f2,f4,f5.
Your call on f1/f3. IIRC f2 didn't change at all, so I'm not sure if you need a test for that (perhaps you should make sure it continues to use a mul rather than a synthesized sequence).

Pre-approved with whatever you decide on the testing side.

jeff


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]