This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch/ccmp] Cost instruction sequences to choose better expand order
- From: pinskia at gmail dot com
- To: Bernd Schmidt <bschmidt at redhat dot com>
- Cc: Jiong Wang <jiong dot wang at arm dot com>, GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Henderson <rth at redhat dot com>
- Date: Mon, 21 Sep 2015 07:01:08 -0700
- Subject: Re: [Patch/ccmp] Cost instruction sequences to choose better expand order
- Authentication-results: sourceware.org; auth=none
- References: <n99vbb7d8uc dot fsf at arm dot com> <55FFEC6A dot 3030809 at redhat dot com>
> On Sep 21, 2015, at 4:39 AM, Bernd Schmidt <bschmidt@redhat.com> wrote:
>
>> On 09/18/2015 05:21 PM, Jiong Wang wrote:
>>
>> Current conditional compare (CCMP) support in GCC aim to optimize
>> short circuit for cascade comparision, given a simple conditional
>> compare candidate:
>>
>> if (a == 17 || a == 32)
> [...]
>> The problem is current implementation always expand t0 first, then
>> t1. While the expand order need to consider the rtx costs, because "cmp"
>> and "ccmp" may have different restrictions that the expand order will
>> result in performance differences.
>>
>> For example on AArch64, "ccmp" only accept immediate within -31 ~ 31
>> while "cmp" accept wider range, so if we expand "a == 32" in the second
>> step, then it will use "ccmp", and thus an extra "mov reg, 32"
>> instruction is generated because "32" is out of the range. While if we
>> expand "a == 32" first, then it's fine as "cmp" can encoding it.
>
> I've played with this patch a bit with an aarch64 cross compiler. First of all - it doesn't seem to work, I get identical costs and the swapping doesn't happen. Did you forget to include a rtx_cost patch?
>
> I was a little worried about whether this would be expensive for longer sequences of conditions, but it seems like it looks only at leafs where we have two comparisons, so that cost should be minimal. However, it's possible there's room for improvement in code generation. I was curious and looked at a slightly more complex testcase
>
> int
> foo (int a)
> {
> if (a == 17 || a == 32 || a == 47 || a == 53 || a == 66 || a == 72)
> return 1;
> else
> return 2;
> }
>
> and this doesn't generate a sequence of ccmps as might have been expected; we only get pairs of comparisons merged with a bit_ior:
>
> D.2699 = a == 17;
> D.2700 = a == 32;
> D.2701 = D.2699 | D.2700;
> if (D.2701 != 0) goto <D.2697>; else goto <D.2702>;
> <D.2702>:
> D.2703 = a == 47;
> D.2704 = a == 53;
> D.2705 = D.2703 | D.2704;
> if (D.2705 != 0) goto <D.2697>; else goto <D.2706>;
>
> and the ccmp expander doesn't see the entire thing. I found that a little surprising TBH.
This is a known issue with fold-const folding too early. Replace || with | and add some parentheses and you should get a string of ccmp's.
Thanks,
Andrew
>
>
> Bernd