This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates



On 03/08/15 14:43, Kyrill Tkachov wrote:
On 03/08/15 14:37, Uros Bizjak wrote:
On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
On 03/08/15 13:33, Uros Bizjak wrote:
Hello!

2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

       * ifcvt.c (noce_try_store_flag_constants): Make logic of the case
       when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
       explicit.  Prefer to add the flag whenever possible.
       (noce_process_if_block): Try noce_try_store_flag_constants before
       noce_try_cmove.

2015-07-30  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

       * gcc.target/aarch64/csel_bfx_1.c: New test.
       * gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
This patch regressed following tests on x86_64:

FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]
Sorry for that :(
I could have sworn they passed for me during my run...

While cmov3 case is questionable by itself in light of PR56309 [1],
the cnov2 case regressed from:

          cmpl    %edi, %esi
          sbbl    %eax, %eax
          andl    $-10, %eax
          addl    $5, %eax
          ret

to:

          xorl    %eax, %eax
          cmpl    %esi, %edi
          setbe   %al
          negl    %eax
          andl    $10, %eax
          subl    $5, %eax
          ret

Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
anymore. Non-QImode setcc instructions are problematic on x86, since
they need to be zero-extended to their full length.
IMO, we can check if the target is able to generate insn that
implements conditional move between 0 and -1 for certain comparison
mode, like:

#(insn:TI 27 26 28 2 (parallel [
#            (set (reg:SI 0 ax [orig:87 D.1845 ] [87])
#                (if_then_else:SI (ltu:SI (reg:CC 17 flags)
#                        (const_int 0 [0]))
#                    (const_int -1 [0xffffffffffffffff])
#                    (const_int 0 [0])))
#            (clobber (reg:CC 17 flags))
#        ]) cmov2.c:9 947 {*x86_movsicc_0_m1}
#     (expr_list:REG_DEAD (reg:CC 17 flags)
#        (expr_list:REG_UNUSED (reg:CC 17 flags)
#            (nil))))
         sbbl    %eax, %eax      # 27    *x86_movsicc_0_m1       [length = 2]

this is the key insn, and as shown above, further asm sequence is
similar between patched and unpatched compiler.
Hmm yes.
We have a HAVE_conditional_move check, but that's not fine grained enough.
How about trying an emit_conditional_move and checking that the result is a
single insn?

Hmm, looking deeper into it, I see that:

       /* if (test) x = a; else x = b;
      =>   x = (-(test != 0) & (b - a)) + a;  */

is exactly the transformation that:
       cmpl    %edi, %esi
       sbbl    %eax, %eax
       andl    $-10, %eax
       addl    $5, %eax

is supposed to represent, right? It seems that noce_try_store_flag_constants is just
not emitting the optimal rtx forms for x86
So, if before the patch the code in noce_try_store_flag_constants was never being called
what performed this transformation? Turns out it's the conditional move expander in
emit_conditional_move that ends up calling the ix86_expand_int_movcc that performs lots of
these complex transformations, including this one in an x86-specific way.

I suppose that's why noce_try_store_flag_constants was tried after noce_try_cmove.
Because, it is expected that noce_try_store_flag_constants will call the conditional move
expander and the target-specific conditional move expander is expected to do all the complex
transformations beneficial to the target. I suppose the comment "Try only simple constants and registers here"
above noce_try_cmove is somewhat misleading then.

I see a couple of avenues here. We try both noce_try_cmove and noce_try_store_flag_constants, compare their
seq_cost and pick the cheapest, which sounds somewhat ugly to me, or in noce_try_store_flag_constants we
try to be more fine-grained and try to restrict some of the transformations, but on what criteria?

Kyrill





Kyrill


Uros.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]