[PATCH][RTL-ifcvt] Improve conditional select ops on immediates

Kyrill Tkachov kyrylo.tkachov@arm.com
Mon Aug 3 14:12:00 GMT 2015


On 03/08/15 14:43, Kyrill Tkachov wrote:
> On 03/08/15 14:37, Uros Bizjak wrote:
>> On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>>> On 03/08/15 13:33, Uros Bizjak wrote:
>>>> Hello!
>>>>
>>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>
>>>>>        * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
>>>>>        when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
>>>>>        explicit.  Prefer to add the flag whenever possible.
>>>>>        (noce_process_if_block): Try noce_try_store_flag_constants before
>>>>>        noce_try_cmove.
>>>>>
>>>>> 2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>
>>>>>        * gcc.target/aarch64/csel_bfx_1.c: New test.
>>>>>        * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
>>>> This patch regressed following tests on x86_64:
>>>>
>>>> FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
>>>> FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]
>>> Sorry for that :(
>>> I could have sworn they passed for me during my run...
>>>
>>>> While cmov3 case is questionable by itself in light of PR56309 [1],
>>>> the cnov2 case regressed from:
>>>>
>>>>           cmpl    %edi, %esi
>>>>           sbbl    %eax, %eax
>>>>           andl    $-10, %eax
>>>>           addl    $5, %eax
>>>>           ret
>>>>
>>>> to:
>>>>
>>>>           xorl    %eax, %eax
>>>>           cmpl    %esi, %edi
>>>>           setbe   %al
>>>>           negl    %eax
>>>>           andl    $10, %eax
>>>>           subl    $5, %eax
>>>>           ret
>>>>
>>>> Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
>>>> anymore. Non-QImode setcc instructions are problematic on x86, since
>>>> they need to be zero-extended to their full length.
>> IMO, we can check if the target is able to generate insn that
>> implements conditional move between 0 and -1 for certain comparison
>> mode, like:
>>
>> #(insn:TI 27 26 28 2 (parallel [
>> #            (set (reg:SI 0 ax [orig:87 D.1845 ] [87])
>> #                (if_then_else:SI (ltu:SI (reg:CC 17 flags)
>> #                        (const_int 0 [0]))
>> #                    (const_int -1 [0xffffffffffffffff])
>> #                    (const_int 0 [0])))
>> #            (clobber (reg:CC 17 flags))
>> #        ]) cmov2.c:9 947 {*x86_movsicc_0_m1}
>> #     (expr_list:REG_DEAD (reg:CC 17 flags)
>> #        (expr_list:REG_UNUSED (reg:CC 17 flags)
>> #            (nil))))
>>          sbbl    %eax, %eax      # 27    *x86_movsicc_0_m1       [length = 2]
>>
>> this is the key insn, and as shown above, further asm sequence is
>> similar between patched and unpatched compiler.
> Hmm yes.
> We have a HAVE_conditional_move check, but that's not fine grained enough.
> How about trying an emit_conditional_move and checking that the result is a
> single insn?

Hmm, looking deeper into it, I see that:

        /* if (test) x = a; else x = b;
       =>   x = (-(test != 0) & (b - a)) + a;  */

is exactly the transformation that:
        cmpl    %edi, %esi
        sbbl    %eax, %eax
        andl    $-10, %eax
        addl    $5, %eax

is supposed to represent, right? It seems that noce_try_store_flag_constants is just
not emitting the optimal rtx forms for x86
So, if before the patch the code in noce_try_store_flag_constants was never being called
what performed this transformation? Turns out it's the conditional move expander in
emit_conditional_move that ends up calling the ix86_expand_int_movcc that performs lots of
these complex transformations, including this one in an x86-specific way.

I suppose that's why noce_try_store_flag_constants was tried after noce_try_cmove.
Because, it is expected that noce_try_store_flag_constants will call the conditional move
expander and the target-specific conditional move expander is expected to do all the complex
transformations beneficial to the target. I suppose the comment "Try only simple constants and registers here"
above noce_try_cmove is somewhat misleading then.

I see a couple of avenues here. We try both noce_try_cmove and noce_try_store_flag_constants, compare their
seq_cost and pick the cheapest, which sounds somewhat ugly to me, or in noce_try_store_flag_constants we
try to be more fine-grained and try to restrict some of the transformations, but on what criteria?

Kyrill




>
> Kyrill
>
>
>> Uros.
>>



More information about the Gcc-patches mailing list