This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out


On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> Hi all,
>
> In this PR we want to optimise:
> int foo (int i)
> {
>   return (i == 0) ? N : __builtin_clz (i);
> }
>
> on targets where CLZ is defined at zero to the constant 'N'.
> This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO
> macro.
> The obvious place to implement this would be in combine through simplify-rtx
> where we'd
> recognise an IF_THEN_ELSE of the form:
> (set (reg:SI r1)
>      (if_then_else:SI (ne (reg:SI r2)
>                           (const_int 0 [0]))
>        (clz:SI (reg:SI r2))
>        (const_int 32)))
>
> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify
> it into
> just (clz:SI (reg:SI r2)).
> However, I found this doesn't quite happen for a couple of reasons:
> 1) This depends on ifcvt or some other pass to have created a conditional
> move of the
> two branches that provide the IF_THEN_ELSE to propagate the const_int and
> clz operation into.
>
> 2) Combine will refuse to propagate r2 from the above example into both the
> condition and the
> CLZ at the same time, so the most we see is:
> (set (reg:SI r1)
>      (if_then_else:SI (ne (reg:CC cc)
>             (const_int 0))
>        (clz:SI (reg:SI r2))
>        (const_int 32)))
>
> which is not enough information to perform the simplification.
>
> This patch implements the optimisation in ce1 using the noce ifcvt
> framework.
> During ifcvt noce_process_if_block can see that we're trying to optimise
> something
> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all
> the information
> needed to perform the transformation.
>
> The transformation is performed by adding a new noce_try* function that
> tries to put the
> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to
> simplify that
> using the simplify-rtx machinery. That way, we can implement the
> simplification logic in
> simplify-rtx.c where it belongs.
>
> A similar transformation for CTZ is implemented as well.
> So for code:
> int foo (int i)
> {
>   return (i == 0) ? 32 : __builtin_clz (i);
> }
>
> On aarch64 we now emit:
> foo:
>         clz     w0, w0
>         ret
>
> instead of:
> foo:
>         mov     w1, 32
>         clz     w2, w0
>         cmp     w0, 0
>         csel    w0, w2, w1, ne
>         ret
>
> and for arm similarly we generate:
> foo:
>         clz     r0, r0
>         bx      lr
>
> instead of:
> foo:
>         cmp     r0, #0
>         clzne   r0, r0
>         moveq   r0, #32
>         bx      lr
>
>
> and for x86_64 with -O2 -mlzcnt we generate:
> foo:
>         xorl    %eax, %eax
>         lzcntl  %edi, %eax
>         ret
>
> instead of:
> foo:
>         xorl    %eax, %eax
>         movl    $32, %edx
>         lzcntl  %edi, %eax
>         testl   %edi, %edi
>         cmove   %edx, %eax
>         ret
>
>
> I tried getting this to work on other targets as well, but encountered
> difficulties.
> For example on powerpc the two arms of the condition seen during ifcvt are:
>
> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
>         (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
>      (nil))
> and
> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
>         (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
>      (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
>         (nil)))
>
> So the setup code in noce_process_if_block sees that the set destination is
> not the same
> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0))
> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
> I suppose that's a consequence of how SImode operations are represented in
> early RTL
> on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be
> taught to handle
> destinations that are subregs of one another, but that would be a separate
> patch.
>
> Anyway, is this patch ok for trunk?
>
> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu.
>
> Thanks,
> Kyrill
>
> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR middle-end/37780
>     * ifcvt.c (noce_try_ifelse_collapse): New function.
>     Declare prototype.
>     (noce_process_if_block): Call noce_try_ifelse_collapse.
>     * simplify-rtx.c (simplify_cond_clz_ctz): New function.
>     (simplify_ternary_operation): Use the above to simplify
>     conditional CLZ/CTZ expressions.
>
> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR middle-end/37780
>     * gcc.c-torture/execute/pr37780.c: New test.
>     * gcc.target/aarch64/pr37780_1.c: Likewise.
>     * gcc.target/arm/pr37780_1.c: Likewise.

Hi Kyrylo,

I've noticed that gcc.target/arm/pr37780_1.c fails on
arm if arch < v6.
I first tried to fix the effective-target guard (IIRC, the doc
says clz is available starting with v5t), but that isn't sufficient.

When compiling for armv5-t, the scan-assembler directives
fail. It seems to work with v6t2, so I am wondering whether
it's just a matter of increasing the effective-target arch version,
or if you really intended to make the test pass on these old
architectures?

Thanks,

Christophe.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]