cat test.c typedef int v4si __attribute__((vector_size (16))); typedef char v16qi __attribute__((vector_size (16))); v4si foo (v16qi a, v4si b, v4si c, v4si d) { return ((v4si)~a) < 0 ? c : d; } gcc -Ofast -mavx2 foo(char __vector(16), int __vector(4), int __vector(4), int __vector(4)): vpcmpeqd %xmm1, %xmm1, %xmm1 vpxor %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 ret it can be better with vblendvps xmm0, xmm3, xmm2, xmm0 gimple failed to simplify ((v4si)~a) < 0 ? c : d to ((v4si)a) >= 0 ? c : d With https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571056.html, i observe rtl also won't simplify things like (vec_merge op1 op2 (lt (subreg (not op3) 0) const0_rtx)) to (vec_merge op2 op1 (lt (subreg op3 0) const0_rtx))
One thing is VCE<~A> should be converted to ~VCE<A> which might allow ~B < 0 to be converted to B >= 0. On RTL, it might be useful to still simplify: (subreg (not op3) 0) To: (not (subreg op3 0))
Confirmed. Watch out for (v4sf)~a though. Note there's ~(v4si)(a ^ b) to be considered - outer not and inner bitops which could be combined (likewise inner not and outer bitops). So any canonicalization will miss sth which means consumers should rather be prepared to handle both 'a' and '(v4si)a'.
(In reply to Richard Biener from comment #2) > Confirmed. Watch out for (v4sf)~a though. Note there's Not sure for (v4sf)~a if we honor NANs, (v4sf)~a < 0 could be different from (v4sf)a >= 0;
(In reply to Richard Biener from comment #2) > Confirmed. Watch out for (v4sf)~a though. Note there's > ~(v4si)(a ^ b) to be considered - outer not and inner bitops which could > be combined (likewise inner not and outer bitops). It seems we prefer out not. /* Otherwise prefer ~(X ^ Y) to ~X ^ Y as more canonical. */ (simplify (bit_xor:c (nop_convert?:s (bit_not:s @0)) @1) (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) (bit_not (bit_xor (view_convert @0) @1))))
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>: https://gcc.gnu.org/g:691f05c2197a7b79cb2d7fdbabe3182e22da320a commit r12-5832-g691f05c2197a7b79cb2d7fdbabe3182e22da320a Author: Haochen Jiang <haochen.jiang@intel.com> Date: Thu Dec 2 15:30:17 2021 +0800 Add combine splitter to transform vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0 gcc/ChangeLog: PR target/100738 * config/i386/sse.md (*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_not_ltint): Add new define_insn_and_split. gcc/testsuite/ChangeLog: PR target/100738 * g++.target/i386/pr100738-1.C: New test.
Fixed in GCC12 in the backend.
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>: https://gcc.gnu.org/g:456b53654a3e3cc550c24f2cb0e37e7fdfadf68e commit r12-6032-g456b53654a3e3cc550c24f2cb0e37e7fdfadf68e Author: Haochen Jiang <haochen.jiang@intel.com> Date: Thu Dec 2 15:30:17 2021 +0800 Add combine splitter to transform vpternlogd/vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0 gcc/ChangeLog: PR target/100738 * config/i386/sse.md (*avx_cmp<mode>3_lt, *avx_cmp<mode>3_ltint): Remove MEM_P restriction and add force_reg for operands[2]. (*avx_cmp<mode>3_ltint_not): Add new define_insn_and_split. gcc/testsuite/ChangeLog: PR target/100738 * g++.target/i386/avx512vl-pr100738-1.C: New test.