This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
- From: Kyrill Tkachov <kyrylo dot tkachov at arm dot com>
- To: Uros Bizjak <ubizjak at gmail dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Jeff Law <law at redhat dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>
- Date: Mon, 03 Aug 2015 15:11:57 +0100
- Subject: Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
- Authentication-results: sourceware.org; auth=none
- References: <CAFULd4aZ21S9=hRi9A0LT+AmaMMQG5PQw_nLyi57DLgKR5_2RA at mail dot gmail dot com> <55BF666A dot 5000008 at arm dot com> <CAFULd4ZtOT=YHM5vyy09ELcP_8MZZ9fTV6Mu_mEKqVJinL4sMA at mail dot gmail dot com> <55BF6FE5 dot 8020902 at arm dot com>
On 03/08/15 14:43, Kyrill Tkachov wrote:
On 03/08/15 14:37, Uros Bizjak wrote:
On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
On 03/08/15 13:33, Uros Bizjak wrote:
Hello!
2015-07-30 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* ifcvt.c (noce_try_store_flag_constants): Make logic of the case
when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more
explicit. Prefer to add the flag whenever possible.
(noce_process_if_block): Try noce_try_store_flag_constants before
noce_try_cmove.
2015-07-30 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* gcc.target/aarch64/csel_bfx_1.c: New test.
* gcc.target/aarch64/csel_imms_inc_1.c: Likewise.
This patch regressed following tests on x86_64:
FAIL: gcc.target/i386/cmov2.c scan-assembler sbb
FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3]
Sorry for that :(
I could have sworn they passed for me during my run...
While cmov3 case is questionable by itself in light of PR56309 [1],
the cnov2 case regressed from:
cmpl %edi, %esi
sbbl %eax, %eax
andl $-10, %eax
addl $5, %eax
ret
to:
xorl %eax, %eax
cmpl %esi, %edi
setbe %al
negl %eax
andl $10, %eax
subl $5, %eax
ret
Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated
anymore. Non-QImode setcc instructions are problematic on x86, since
they need to be zero-extended to their full length.
IMO, we can check if the target is able to generate insn that
implements conditional move between 0 and -1 for certain comparison
mode, like:
#(insn:TI 27 26 28 2 (parallel [
# (set (reg:SI 0 ax [orig:87 D.1845 ] [87])
# (if_then_else:SI (ltu:SI (reg:CC 17 flags)
# (const_int 0 [0]))
# (const_int -1 [0xffffffffffffffff])
# (const_int 0 [0])))
# (clobber (reg:CC 17 flags))
# ]) cmov2.c:9 947 {*x86_movsicc_0_m1}
# (expr_list:REG_DEAD (reg:CC 17 flags)
# (expr_list:REG_UNUSED (reg:CC 17 flags)
# (nil))))
sbbl %eax, %eax # 27 *x86_movsicc_0_m1 [length = 2]
this is the key insn, and as shown above, further asm sequence is
similar between patched and unpatched compiler.
Hmm yes.
We have a HAVE_conditional_move check, but that's not fine grained enough.
How about trying an emit_conditional_move and checking that the result is a
single insn?
Hmm, looking deeper into it, I see that:
/* if (test) x = a; else x = b;
=> x = (-(test != 0) & (b - a)) + a; */
is exactly the transformation that:
cmpl %edi, %esi
sbbl %eax, %eax
andl $-10, %eax
addl $5, %eax
is supposed to represent, right? It seems that noce_try_store_flag_constants is just
not emitting the optimal rtx forms for x86
So, if before the patch the code in noce_try_store_flag_constants was never being called
what performed this transformation? Turns out it's the conditional move expander in
emit_conditional_move that ends up calling the ix86_expand_int_movcc that performs lots of
these complex transformations, including this one in an x86-specific way.
I suppose that's why noce_try_store_flag_constants was tried after noce_try_cmove.
Because, it is expected that noce_try_store_flag_constants will call the conditional move
expander and the target-specific conditional move expander is expected to do all the complex
transformations beneficial to the target. I suppose the comment "Try only simple constants and registers here"
above noce_try_cmove is somewhat misleading then.
I see a couple of avenues here. We try both noce_try_cmove and noce_try_store_flag_constants, compare their
seq_cost and pick the cheapest, which sounds somewhat ugly to me, or in noce_try_store_flag_constants we
try to be more fine-grained and try to restrict some of the transformations, but on what criteria?
Kyrill
Kyrill
Uros.