This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch/ccmp] Cost instruction sequences to choose better expand order



> On Sep 21, 2015, at 4:39 AM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> 
>> On 09/18/2015 05:21 PM, Jiong Wang wrote:
>> 
>> Current conditional compare (CCMP) support in GCC aim to optimize
>> short circuit for cascade comparision, given a simple conditional
>> compare candidate:
>> 
>>   if (a == 17 || a == 32)
> [...]
>> The problem is current implementation always expand t0 first, then
>> t1. While the expand order need to consider the rtx costs, because "cmp"
>> and "ccmp" may have different restrictions that the expand order will
>> result in performance differences.
>> 
>> For example on AArch64, "ccmp" only accept immediate within -31 ~ 31
>> while "cmp" accept wider range, so if we expand "a == 32" in the second
>> step, then it will use "ccmp", and thus an extra "mov reg, 32"
>> instruction is generated because "32" is out of the range. While if we
>> expand "a == 32" first, then it's fine as "cmp" can encoding it.
> 
> I've played with this patch a bit with an aarch64 cross compiler. First of all - it doesn't seem to work, I get identical costs and the swapping doesn't happen. Did you forget to include a rtx_cost patch?
> 
> I was a little worried about whether this would be expensive for longer sequences of conditions, but it seems like it looks only at leafs where we have two comparisons, so that cost should be minimal. However, it's possible there's room for improvement in code generation. I was curious and looked at a slightly more complex testcase
> 
> int
> foo (int a)
> {
>  if (a == 17 || a == 32 || a == 47 || a == 53 || a == 66 || a == 72)
>    return 1;
>  else
>    return 2;
> }
> 
> and this doesn't generate a sequence of ccmps as might have been expected; we only get pairs of comparisons merged with a bit_ior:
> 
>  D.2699 = a == 17;
>  D.2700 = a == 32;
>  D.2701 = D.2699 | D.2700;
>  if (D.2701 != 0) goto <D.2697>; else goto <D.2702>;
>  <D.2702>:
>  D.2703 = a == 47;
>  D.2704 = a == 53;
>  D.2705 = D.2703 | D.2704;
>  if (D.2705 != 0) goto <D.2697>; else goto <D.2706>;
> 
> and the ccmp expander doesn't see the entire thing. I found that a little surprising TBH.

This is a known issue with fold-const folding too early. Replace || with | and add some parentheses and you should get a string of ccmp's. 

Thanks,
Andrew


> 
> 
> Bernd


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]