This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch/ccmp] Cost instruction sequences to choose better expand order


Bernd Schmidt writes:

> On 09/18/2015 05:21 PM, Jiong Wang wrote:
>>
>> Current conditional compare (CCMP) support in GCC aim to optimize
>> short circuit for cascade comparision, given a simple conditional
>> compare candidate:
>>
>>    if (a == 17 || a == 32)
> [...]
>> The problem is current implementation always expand t0 first, then
>> t1. While the expand order need to consider the rtx costs, because "cmp"
>> and "ccmp" may have different restrictions that the expand order will
>> result in performance differences.
>>
>> For example on AArch64, "ccmp" only accept immediate within -31 ~ 31
>> while "cmp" accept wider range, so if we expand "a == 32" in the second
>> step, then it will use "ccmp", and thus an extra "mov reg, 32"
>> instruction is generated because "32" is out of the range. While if we
>> expand "a == 32" first, then it's fine as "cmp" can encoding it.
>
> I've played with this patch a bit with an aarch64 cross compiler. First 
> of all - it doesn't seem to work, I get identical costs and the swapping 
> doesn't happen. Did you forget to include a rtx_cost patch?

No. Please see NOTE part of the description. AArch64 doesn't cost ccmp
currently. It will be fixed by a seperate patch later. The testcase is
thus marked as XFAIL.

>
> I was a little worried about whether this would be expensive for longer 
> sequences of conditions, but it seems like it looks only at leafs where 
> we have two comparisons, so that cost should be minimal. However, it's 
> possible there's room for improvement in code generation. I was curious 
> and looked at a slightly more complex testcase
>
> int
> foo (int a)
> {
>    if (a == 17 || a == 32 || a == 47 || a == 53 || a == 66 || a == 72)
>      return 1;
>    else
>      return 2;
> }

Yes, this patch only cost the inner most call of the recursive call of
expand_ccmp_expr_1 which is simple and common situation.
>
> and this doesn't generate a sequence of ccmps as might have been 
> expected; we only get pairs of comparisons merged with a bit_ior:
>
>    D.2699 = a == 17;
>    D.2700 = a == 32;
>    D.2701 = D.2699 | D.2700;
>    if (D.2701 != 0) goto <D.2697>; else goto <D.2702>;
>    <D.2702>:
>    D.2703 = a == 47;
>    D.2704 = a == 53;
>    D.2705 = D.2703 | D.2704;
>    if (D.2705 != 0) goto <D.2697>; else goto <D.2706>;
>
> and the ccmp expander doesn't see the entire thing. I found that a 
> little surprising TBH.

It's because fold_truth_andor has re-written the tree to avoid the use
short circuit given LOGICAL_OP_NON_SHORT_CIRCUIT returns true.

-- 
Regards,
Jiong


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]