This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
- From: Christophe Lyon <christophe dot lyon at linaro dot org>
- To: Kyrill Tkachov <kyrylo dot tkachov at foss dot arm dot com>
- Cc: GCC Patches <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 7 Jun 2016 21:34:55 +0200
- Subject: Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
- Authentication-results: sourceware.org; auth=none
- References: <5746C78B dot 3030006 at foss dot arm dot com>
On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> Hi all,
>
> In this PR we want to optimise:
> int foo (int i)
> {
> return (i == 0) ? N : __builtin_clz (i);
> }
>
> on targets where CLZ is defined at zero to the constant 'N'.
> This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO
> macro.
> The obvious place to implement this would be in combine through simplify-rtx
> where we'd
> recognise an IF_THEN_ELSE of the form:
> (set (reg:SI r1)
> (if_then_else:SI (ne (reg:SI r2)
> (const_int 0 [0]))
> (clz:SI (reg:SI r2))
> (const_int 32)))
>
> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify
> it into
> just (clz:SI (reg:SI r2)).
> However, I found this doesn't quite happen for a couple of reasons:
> 1) This depends on ifcvt or some other pass to have created a conditional
> move of the
> two branches that provide the IF_THEN_ELSE to propagate the const_int and
> clz operation into.
>
> 2) Combine will refuse to propagate r2 from the above example into both the
> condition and the
> CLZ at the same time, so the most we see is:
> (set (reg:SI r1)
> (if_then_else:SI (ne (reg:CC cc)
> (const_int 0))
> (clz:SI (reg:SI r2))
> (const_int 32)))
>
> which is not enough information to perform the simplification.
>
> This patch implements the optimisation in ce1 using the noce ifcvt
> framework.
> During ifcvt noce_process_if_block can see that we're trying to optimise
> something
> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all
> the information
> needed to perform the transformation.
>
> The transformation is performed by adding a new noce_try* function that
> tries to put the
> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to
> simplify that
> using the simplify-rtx machinery. That way, we can implement the
> simplification logic in
> simplify-rtx.c where it belongs.
>
> A similar transformation for CTZ is implemented as well.
> So for code:
> int foo (int i)
> {
> return (i == 0) ? 32 : __builtin_clz (i);
> }
>
> On aarch64 we now emit:
> foo:
> clz w0, w0
> ret
>
> instead of:
> foo:
> mov w1, 32
> clz w2, w0
> cmp w0, 0
> csel w0, w2, w1, ne
> ret
>
> and for arm similarly we generate:
> foo:
> clz r0, r0
> bx lr
>
> instead of:
> foo:
> cmp r0, #0
> clzne r0, r0
> moveq r0, #32
> bx lr
>
>
> and for x86_64 with -O2 -mlzcnt we generate:
> foo:
> xorl %eax, %eax
> lzcntl %edi, %eax
> ret
>
> instead of:
> foo:
> xorl %eax, %eax
> movl $32, %edx
> lzcntl %edi, %eax
> testl %edi, %edi
> cmove %edx, %eax
> ret
>
>
> I tried getting this to work on other targets as well, but encountered
> difficulties.
> For example on powerpc the two arms of the condition seen during ifcvt are:
>
> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
> (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
> (nil))
> and
> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
> (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
> (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
> (nil)))
>
> So the setup code in noce_process_if_block sees that the set destination is
> not the same
> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0))
> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
> I suppose that's a consequence of how SImode operations are represented in
> early RTL
> on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be
> taught to handle
> destinations that are subregs of one another, but that would be a separate
> patch.
>
> Anyway, is this patch ok for trunk?
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu.
>
> Thanks,
> Kyrill
>
> 2016-05-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> PR middle-end/37780
> * ifcvt.c (noce_try_ifelse_collapse): New function.
> Declare prototype.
> (noce_process_if_block): Call noce_try_ifelse_collapse.
> * simplify-rtx.c (simplify_cond_clz_ctz): New function.
> (simplify_ternary_operation): Use the above to simplify
> conditional CLZ/CTZ expressions.
>
> 2016-05-26 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>
> PR middle-end/37780
> * gcc.c-torture/execute/pr37780.c: New test.
> * gcc.target/aarch64/pr37780_1.c: Likewise.
> * gcc.target/arm/pr37780_1.c: Likewise.
Hi Kyrylo,
I've noticed that gcc.target/arm/pr37780_1.c fails on
arm if arch < v6.
I first tried to fix the effective-target guard (IIRC, the doc
says clz is available starting with v5t), but that isn't sufficient.
When compiling for armv5-t, the scan-assembler directives
fail. It seems to work with v6t2, so I am wondering whether
it's just a matter of increasing the effective-target arch version,
or if you really intended to make the test pass on these old
architectures?
Thanks,
Christophe.