[PR97903][ARM] Missed optimization in lowering to vtst

Kyrylo Tkachov Kyrylo.Tkachov@arm.com
Fri Feb 5 10:12:39 GMT 2021


Hi Prathamesh,

> -----Original Message-----
> From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
> Sent: 05 February 2021 09:53
> To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> Subject: [PR97903][ARM] Missed optimization in lowering to vtst
> 
> Hi,
> For the following test-case:
> 
> #include <arm_neon.h>
> 
> uint8x8_t f1(int8x8_t a, int8x8_t b) {
>   return (uint8x8_t) ((a & b) != 0);
> }
> 
> gcc fails to lower test operation to vtst, and instead emits:
> f1:
>         vand    d0, d0, d1
>         vceq.i8 d0, d0, #0
>         vmvn    d0, d0
>         bx      lr
> 
> The attached patch tries to fix this by adding a pattern to match this combine:
> Trying 7, 8 -> 9:
>     7: r120:V8QI=r123:V8QI&r124:V8QI
>       REG_DEAD r124:V8QI
>       REG_DEAD r123:V8QI
>     8: r122:V8QI=-r120:V8QI==const_vector
>       REG_DEAD r120:V8QI
>     9: r121:V8QI=~r122:V8QI
>       REG_DEAD r122:V8QI
> Failed to match this instruction:
> (set (reg:V8QI 121)
>     (plus:V8QI (eq:V8QI (and:V8QI (reg:V8QI 123)
>                 (reg:V8QI 124))
>             (const_vector:V8QI [
>                     (const_int 0 [0]) repeated x8
>                 ]))
>         (const_vector:V8QI [
>                 (const_int -1 [0xffffffffffffffff]) repeated x8
>             ])))
> 
> Essentially it converts:
> r120 = (and r123 r124)
> r122 = (neg (eq r120 0))
> r121 = (not r122)
> -->
> r121 = vtst r123, r124
> 
> (I guess it simplifies (not (neg X)) to (plus X -1) above).
> 
> Code-gen after patch:
> f1:
>         vtst.8  d0, d0, d1
>         bx      lr
> 

+(define_insn "neon_vtst_combine<mode>"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
+        (plus:VDQIW
+	  (eq:VDQIW
+	    (and:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")
+		       (match_operand:VDQIW 2 "s_register_operand" "w"))
+	    (match_operand:VDQIW 3 "zero_operand" "i"))
+	  (match_operand:VDQIW 4 "minus_one_operand" "i")))]
+  "TARGET_NEON"
+  "vtst.<V_sz_elem>\t%<V_reg>0, %<V_reg>1, %<V_reg>2"
+)

This will need a type attribute for scheduling.

> Bootstrapped + tested on arm-linux-gnueabihf, and
> cross tested on arm*-*-*.
> Does it look OK for next stage-1 ?

It looks sensible to me for stage 1.
Thanks,
Kyrill

> 
> Thanks,
> Prathamesh


More information about the Gcc-patches mailing list