105791 – [13 Regression] ICE: in extract_insn, at recog.cc:2791 (unrecognizable insn) with -mxop

Bug 105791 - [13 Regression] ICE: in extract_insn, at recog.cc:2791 (unrecognizable insn) with -mxop

Summary: [13 Regression] ICE: in extract_insn, at recog.cc:2791 (unrecognizable insn) ...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	13.0

Importance:	P3 normal
Target Milestone:	13.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:	ice-on-valid-code

Depends on:
Blocks:

Reported:	2022-05-31 11:47 UTC by Zdenek Sojka
Modified:	2022-06-07 06:51 UTC (History)
CC List:	1 user (show)

See Also:
Host:	x86_64-pc-linux-gnu
Target:	x86_64-pc-linux-gnu
Build:
Known to work:	12.1.1
Known to fail:	13.0
Last reconfirmed:	2022-05-31 00:00:00

Attachments
reduced testcase (145 bytes, text/plain) 2022-05-31 11:47 UTC, Zdenek Sojka	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Zdenek Sojka 2022-05-31 11:47:11 UTC

Created attachment 53057 [details]
reduced testcase

Compiler output:
$ x86_64-pc-linux-gnu-gcc -O -mxop testcase.c 
testcase.c: In function 'foo':
testcase.c:11:1: error: unrecognizable insn:
   11 | }
      | ^
(insn 36 35 40 2 (set (reg:V1TI 90 [ <retval> ])
        (if_then_else:V1TI (reg:V1TI 115)
            (reg:V1TI 116)
            (reg:V1TI 117))) "testcase.c":10:48 -1
     (nil))
during RTL pass: vregs
testcase.c:11:1: internal compiler error: in extract_insn, at recog.cc:2791
0x76f806 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
        /repo/gcc-trunk/gcc/rtl-error.cc:108
0x76f882 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
        /repo/gcc-trunk/gcc/rtl-error.cc:116
0x75e550 extract_insn(rtx_insn*)
        /repo/gcc-trunk/gcc/recog.cc:2791
0x1019419 instantiate_virtual_regs_in_insn
        /repo/gcc-trunk/gcc/function.cc:1611
0x1019419 instantiate_virtual_regs
        /repo/gcc-trunk/gcc/function.cc:1985
0x1019419 execute
        /repo/gcc-trunk/gcc/function.cc:2034
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r13-861-20220531001632-g0f4df800b15-checking-yes-rtl-df-extra-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/13.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r13-861-20220531001632-g0f4df800b15-checking-yes-rtl-df-extra-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.0 20220531 (experimental) (GCC)

Comment 1 Roger Sayle 2022-05-31 13:59:37 UTC

I believe this would be fixed by:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595382.html
but Richard Biener insists that the middle-end doesn't/shouldn't create VEC_COND_EXPR if they are not natively supported by the target.

Comment 2 Roger Sayle 2022-05-31 14:39:25 UTC

Doh! V1TI needs to be added to V_128_256.  I'll spin a patch.

Comment 3 GCC Commits 2022-06-02 17:48:28 UTC

The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:37e4e7f77d8f7b7e911bf611a0f8edbc3a850c7a

commit r13-961-g37e4e7f77d8f7b7e911bf611a0f8edbc3a850c7a
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Thu Jun 2 18:46:37 2022 +0100

    PR target/105791: Add V1TI to V_128_256 for xop_pcmov_v1ti on x86_64.
    
    This patch resolves PR target/105791 which is a regression that was
    accidentally introduced for my workaround to PR tree-optimization/10566.
    (a deeper problem in GCC's vectorizer creating VEC_COND_EXPR when it
    shouldn't).  The latest issues is that by providing a vcond_mask_v1tiv1ti
    pattern in sse.md, the backend now calls ix86_expand_sse_movcc with
    V1TImode operands, which has a special case for TARGET_XOP to generate
    a vpcmov instruction.  Unfortunately, there wasn't previously a V1TImode
    variant, xop_pcmov_v1ti, so we'd ICE.
    
    This is easily fixed by adding V1TImode (and V2TImode) to V_128_256
    which is only used for defining XOP's vpcmov instruction.  This in turn
    requires V1TI (and V2TI) to be supported by <avxsizesuffix> (though
    the use if <avxsizesuffix> in the names xop_pcmov_<mode><avxsizesuffix>
    seems unnecessary; the mode makes the name unique).
    
    2022-06-02  Roger Sayle  <roger@nextmovesoftware.com>
    
    gcc/ChangeLog
            PR target/105791
            * config/i386/sse.md (V_128_256):Add V1TI and V2TI.
            (define_mode_attr avxsizesuffix): Add support for V1TI and V2TI.
    
    gcc/testsuite/ChangeLog
            PR target/105791
            * gcc.target/i386/pr105791.c: New test case.

Comment 4 Roger Sayle 2022-06-04 09:16:23 UTC

This should now be fixed on mainline.  Sorry for the breakage.

Comment 5 GCC Commits 2022-06-07 06:51:17 UTC

The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:c4320bde42c6497b701e2e6b8f1c5069bed19818

commit r13-998-gc4320bde42c6497b701e2e6b8f1c5069bed19818
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Tue Jun 7 07:49:40 2022 +0100

    Recognize vpcmov in combine with -mxop on x86.
    
    By way of an apology for causing PR target/105791, where I'd overlooked
    the need to support V1TImode in TARGET_XOP's vpcmov instruction, this
    patch further improves support for TARGET_XOP's vpcmov instruction, by
    recognizing it in combine.
    
    Currently, the test case:
    
    typedef int v4si __attribute__ ((vector_size (16)));
    v4si foo(v4si c, v4si t, v4si f)
    {
        return (c&t)|(~c&f);
    }
    
    on x86_64 with -O2 -mxop generates:
            vpxor   %xmm2, %xmm1, %xmm1
            vpand   %xmm0, %xmm1, %xmm1
            vpxor   %xmm2, %xmm1, %xmm0
            ret
    
    but with this patch now generates:
            vpcmov  %xmm0, %xmm2, %xmm1, %xmm0
            ret
    
    On its own, the new combine splitter works fine on TARGET_64BIT, but
    alas with -m32 combine incorrectly thinks the replacement instruction
    is more expensive, as IF_THEN_ELSE isn't currently/correctly handled
    in ix86_rtx_costs.  So to avoid the need for a target selector in the
    new tescase, I've updated ix86_rtx_costs to report that AMD's vpcmov
    has a latency of two cycles [it's now an obsolete instruction set
    extension and there's unlikely to ever be a processor where this
    instruction has a different timing], and while there I also added
    rtx_costs for x86_64's integer conditional move instructions (which
    have single cycle latency).
    
    2022-06-07  Roger Sayle  <roger@nextmovesoftware.com>
    
    gcc/ChangeLog
            * config/i386/i386.cc (ix86_rtx_costs): Add a new case for
            IF_THEN_ELSE, and provide costs for TARGET_XOP's vpcmov and
            TARGET_CMOVE's (scalar integer) conditional moves.
            * config/i386/sse.md (define_split): Recognize XOP's vpcmov
            from its equivalent (canonical) pxor;pand;pxor sequence.
    
    gcc/testsuite/ChangeLog
            * gcc.target/i386/xop-pcmov3.c: New test case.