[Bug target/100048] [10/11 Regression] Wrongful CSE'ing of SVE predicates.

Fri Apr 16 16:08:19 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100048

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Tamar Christina
<tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:d15a2a00a384672c5f8228d49eba2b0c09048a43

commit r10-9708-gd15a2a00a384672c5f8228d49eba2b0c09048a43
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Fri Apr 16 16:58:50 2021 +0100

    SVE: Fix wrong sve predicate split (PR100048)

    The attached testcase generates the following paradoxical subregs when
creating
    the predicates.

    (insn 22 21 23 2 (set (reg:VNx8BI 100)
            (subreg:VNx8BI (reg:VNx2BI 103) 0))
         (expr_list:REG_EQUAL (const_vector:VNx8BI [
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (const_int 1 [0x1])
            (const_int 0 [0]) repeated x5
                ])
            (nil)))

    and

    (insn 15 14 16 2 (set (reg:VNx8BI 96)
            (subreg:VNx8BI (reg:VNx2BI 99) 0))
         (expr_list:REG_EQUAL (const_vector:VNx8BI [
                    (const_int 1 [0x1])
                    (const_int 0 [0]) repeated x7
                ])
            (nil)))

    This causes CSE to incorrectly think that the two predicates are equal
because
    some of the significant bits get ignored due to the subreg.

    The attached patch instead makes it so it always looks at all 16-bits of
the
    predicate, but in turn means we need to generate a TRN that matches the
expected
    result mode.  In effect in RTL we keep the mode as VNx16BI but during
codegen
    re-interpret them as the mode the predicate instruction wanted:

    (insn 10 9 11 2 (set (reg:VNx8BI 96)
            (subreg:VNx8BI (reg:VNx16BI 99) 0))
         (expr_list:REG_EQUAL (const_vector:VNx8BI [
                    (const_int 1 [0x1])
                    (const_int 0 [0]) repeated x7
                ])
            (nil)))

    Which needed correction to the TRN pattern.  A new TRN1_CONV unspec is
    introduced which allows one to keep the arguments as VNx16BI but encode the
    instruction as a type of the last operand.

    (insn 9 8 10 2 (set (reg:VNx16BI 99)
            (unspec:VNx16BI [
                    (reg:VNx16BI 97)
                    (reg:VNx16BI 98)
                    (reg:VNx2BI 100)
                ] UNSPEC_TRN1_CONV))
            (nil))

    This allows us remove all the paradoxical subregs and end up with

    (insn 16 15 17 2 (set (reg:VNx8BI 101)
            (subreg:VNx8BI (reg:VNx16BI 104) 0))
            (expr_list:REG_EQUAL (const_vector:VNx8BI [
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (const_int 0 [0]) repeated x5
                ])
            (nil)))

    gcc/ChangeLog:

            PR target/100048
            * config/aarch64/aarch64-sve.md (@aarch64_sve_trn1_conv<mode>):
New.
            * config/aarch64/aarch64.c (aarch64_expand_sve_const_pred_trn): Use
new
            TRN optab.
            * config/aarch64/iterators.md (UNSPEC_TRN1_CONV): New.

    gcc/testsuite/ChangeLog:

            PR target/100048
            * gcc.target/aarch64/sve/pr100048.c: New test.

    (cherry picked from commit 8535755af70f819d820553b2e73e72a16a984599)