This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [Patch/ccmp] Cost instruction sequences to choose better expand order
- From: Jiong Wang <jiong dot wang at arm dot com>
- To: Bernd Schmidt <bschmidt at redhat dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>, Richard Henderson <rth at redhat dot com>
- Date: Mon, 21 Sep 2015 12:43:01 +0100
- Subject: Re: [Patch/ccmp] Cost instruction sequences to choose better expand order
- Authentication-results: sourceware.org; auth=none
- References: <n99vbb7d8uc dot fsf at arm dot com> <55FFEC6A dot 3030809 at redhat dot com>
Bernd Schmidt writes:
> On 09/18/2015 05:21 PM, Jiong Wang wrote:
>>
>> Current conditional compare (CCMP) support in GCC aim to optimize
>> short circuit for cascade comparision, given a simple conditional
>> compare candidate:
>>
>> if (a == 17 || a == 32)
> [...]
>> The problem is current implementation always expand t0 first, then
>> t1. While the expand order need to consider the rtx costs, because "cmp"
>> and "ccmp" may have different restrictions that the expand order will
>> result in performance differences.
>>
>> For example on AArch64, "ccmp" only accept immediate within -31 ~ 31
>> while "cmp" accept wider range, so if we expand "a == 32" in the second
>> step, then it will use "ccmp", and thus an extra "mov reg, 32"
>> instruction is generated because "32" is out of the range. While if we
>> expand "a == 32" first, then it's fine as "cmp" can encoding it.
>
> I've played with this patch a bit with an aarch64 cross compiler. First
> of all - it doesn't seem to work, I get identical costs and the swapping
> doesn't happen. Did you forget to include a rtx_cost patch?
No. Please see NOTE part of the description. AArch64 doesn't cost ccmp
currently. It will be fixed by a seperate patch later. The testcase is
thus marked as XFAIL.
>
> I was a little worried about whether this would be expensive for longer
> sequences of conditions, but it seems like it looks only at leafs where
> we have two comparisons, so that cost should be minimal. However, it's
> possible there's room for improvement in code generation. I was curious
> and looked at a slightly more complex testcase
>
> int
> foo (int a)
> {
> if (a == 17 || a == 32 || a == 47 || a == 53 || a == 66 || a == 72)
> return 1;
> else
> return 2;
> }
Yes, this patch only cost the inner most call of the recursive call of
expand_ccmp_expr_1 which is simple and common situation.
>
> and this doesn't generate a sequence of ccmps as might have been
> expected; we only get pairs of comparisons merged with a bit_ior:
>
> D.2699 = a == 17;
> D.2700 = a == 32;
> D.2701 = D.2699 | D.2700;
> if (D.2701 != 0) goto <D.2697>; else goto <D.2702>;
> <D.2702>:
> D.2703 = a == 47;
> D.2704 = a == 53;
> D.2705 = D.2703 | D.2704;
> if (D.2705 != 0) goto <D.2697>; else goto <D.2706>;
>
> and the ccmp expander doesn't see the entire thing. I found that a
> little surprising TBH.
It's because fold_truth_andor has re-written the tree to avoid the use
short circuit given LOGICAL_OP_NON_SHORT_CIRCUIT returns true.
--
Regards,
Jiong