Bug 100738 - Gimple failed to simplify ((v4si) ~a) < 0 ? c : d to ((v4si)a) >= 0 ? c : d
Summary: Gimple failed to simplify ((v4si) ~a) < 0 ? c : d to ((v4si)a) >= 0 ? c : d
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 12.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2021-05-24 08:21 UTC by Hongtao.liu
Modified: 2021-12-17 05:56 UTC (History)
1 user (show)

See Also:
Host: x86_64-pc-linux-gnu
Target: x86_64-*-* i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-05-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hongtao.liu 2021-05-24 08:21:42 UTC
cat test.c



typedef int v4si __attribute__((vector_size (16)));
typedef char v16qi __attribute__((vector_size (16)));
v4si
foo (v16qi a, v4si b, v4si c, v4si d)
{
    return ((v4si)~a) < 0 ? c : d;
}

gcc -Ofast -mavx2

foo(char __vector(16), int __vector(4), int __vector(4), int __vector(4)):
        vpcmpeqd        %xmm1, %xmm1, %xmm1
        vpxor   %xmm1, %xmm0, %xmm0
        vblendvps       %xmm0, %xmm2, %xmm3, %xmm0
        ret

it can be better with 

        vblendvps       xmm0, xmm3, xmm2, xmm0

gimple failed to simplify  ((v4si)~a) < 0 ? c : d to ((v4si)a) >= 0 ? c : d

With https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571056.html, i observe rtl also won't simplify things like (vec_merge op1 op2 (lt (subreg (not op3) 0) const0_rtx)) to (vec_merge op2 op1 (lt (subreg op3 0) const0_rtx))
Comment 1 Andrew Pinski 2021-05-24 08:59:29 UTC
One thing is VCE<~A> should be converted to ~VCE<A> which might allow ~B < 0 to be converted to B >= 0.

On RTL, it might be useful to still simplify:
(subreg (not op3) 0)
To:
(not (subreg op3 0))
Comment 2 Richard Biener 2021-05-25 07:39:21 UTC
Confirmed.  Watch out for (v4sf)~a though.  Note there's
~(v4si)(a ^ b) to be considered - outer not and inner bitops which could
be combined (likewise inner not and outer bitops).  So any canonicalization
will miss sth which means consumers should rather be prepared to handle
both 'a' and '(v4si)a'.
Comment 3 Hongtao.liu 2021-05-26 10:48:19 UTC
(In reply to Richard Biener from comment #2)
> Confirmed.  Watch out for (v4sf)~a though.  Note there's
Not sure for (v4sf)~a if we honor NANs, (v4sf)~a < 0 could be different from (v4sf)a >= 0;
Comment 4 Hongtao.liu 2021-05-27 06:22:40 UTC
(In reply to Richard Biener from comment #2)
> Confirmed.  Watch out for (v4sf)~a though.  Note there's
> ~(v4si)(a ^ b) to be considered - outer not and inner bitops which could
> be combined (likewise inner not and outer bitops).  

It seems we prefer out not.

/* Otherwise prefer ~(X ^ Y) to ~X ^ Y as more canonical.  */
(simplify
 (bit_xor:c (nop_convert?:s (bit_not:s @0)) @1)
 (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
  (bit_not (bit_xor (view_convert @0) @1))))
Comment 5 GCC Commits 2021-12-08 06:13:20 UTC
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:691f05c2197a7b79cb2d7fdbabe3182e22da320a

commit r12-5832-g691f05c2197a7b79cb2d7fdbabe3182e22da320a
Author: Haochen Jiang <haochen.jiang@intel.com>
Date:   Thu Dec 2 15:30:17 2021 +0800

    Add combine splitter to transform vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0
    
    gcc/ChangeLog:
    
            PR target/100738
            * config/i386/sse.md
            (*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_not_ltint):
            Add new define_insn_and_split.
    
    gcc/testsuite/ChangeLog:
    
            PR target/100738
            * g++.target/i386/pr100738-1.C: New test.
Comment 6 Hongtao.liu 2021-12-08 06:15:15 UTC
Fixed in GCC12 in the backend.
Comment 7 GCC Commits 2021-12-17 05:56:34 UTC
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:456b53654a3e3cc550c24f2cb0e37e7fdfadf68e

commit r12-6032-g456b53654a3e3cc550c24f2cb0e37e7fdfadf68e
Author: Haochen Jiang <haochen.jiang@intel.com>
Date:   Thu Dec 2 15:30:17 2021 +0800

    Add combine splitter to transform vpternlogd/vpcmpeqd/vpxor/vblendvps to vblendvps for ~op0
    
    gcc/ChangeLog:
    
            PR target/100738
            * config/i386/sse.md (*avx_cmp<mode>3_lt, *avx_cmp<mode>3_ltint):
            Remove MEM_P restriction and add force_reg for operands[2].
            (*avx_cmp<mode>3_ltint_not): Add new define_insn_and_split.
    
    gcc/testsuite/ChangeLog:
    
            PR target/100738
            * g++.target/i386/avx512vl-pr100738-1.C: New test.