Created attachment 53057 [details] reduced testcase Compiler output: $ x86_64-pc-linux-gnu-gcc -O -mxop testcase.c testcase.c: In function 'foo': testcase.c:11:1: error: unrecognizable insn: 11 | } | ^ (insn 36 35 40 2 (set (reg:V1TI 90 [ <retval> ]) (if_then_else:V1TI (reg:V1TI 115) (reg:V1TI 116) (reg:V1TI 117))) "testcase.c":10:48 -1 (nil)) during RTL pass: vregs testcase.c:11:1: internal compiler error: in extract_insn, at recog.cc:2791 0x76f806 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) /repo/gcc-trunk/gcc/rtl-error.cc:108 0x76f882 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) /repo/gcc-trunk/gcc/rtl-error.cc:116 0x75e550 extract_insn(rtx_insn*) /repo/gcc-trunk/gcc/recog.cc:2791 0x1019419 instantiate_virtual_regs_in_insn /repo/gcc-trunk/gcc/function.cc:1611 0x1019419 instantiate_virtual_regs /repo/gcc-trunk/gcc/function.cc:1985 0x1019419 execute /repo/gcc-trunk/gcc/function.cc:2034 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. $ x86_64-pc-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r13-861-20220531001632-g0f4df800b15-checking-yes-rtl-df-extra-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/13.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r13-861-20220531001632-g0f4df800b15-checking-yes-rtl-df-extra-amd64 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.0.0 20220531 (experimental) (GCC)
I believe this would be fixed by: https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595382.html but Richard Biener insists that the middle-end doesn't/shouldn't create VEC_COND_EXPR if they are not natively supported by the target.
Doh! V1TI needs to be added to V_128_256. I'll spin a patch.
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>: https://gcc.gnu.org/g:37e4e7f77d8f7b7e911bf611a0f8edbc3a850c7a commit r13-961-g37e4e7f77d8f7b7e911bf611a0f8edbc3a850c7a Author: Roger Sayle <roger@nextmovesoftware.com> Date: Thu Jun 2 18:46:37 2022 +0100 PR target/105791: Add V1TI to V_128_256 for xop_pcmov_v1ti on x86_64. This patch resolves PR target/105791 which is a regression that was accidentally introduced for my workaround to PR tree-optimization/10566. (a deeper problem in GCC's vectorizer creating VEC_COND_EXPR when it shouldn't). The latest issues is that by providing a vcond_mask_v1tiv1ti pattern in sse.md, the backend now calls ix86_expand_sse_movcc with V1TImode operands, which has a special case for TARGET_XOP to generate a vpcmov instruction. Unfortunately, there wasn't previously a V1TImode variant, xop_pcmov_v1ti, so we'd ICE. This is easily fixed by adding V1TImode (and V2TImode) to V_128_256 which is only used for defining XOP's vpcmov instruction. This in turn requires V1TI (and V2TI) to be supported by <avxsizesuffix> (though the use if <avxsizesuffix> in the names xop_pcmov_<mode><avxsizesuffix> seems unnecessary; the mode makes the name unique). 2022-06-02 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/105791 * config/i386/sse.md (V_128_256):Add V1TI and V2TI. (define_mode_attr avxsizesuffix): Add support for V1TI and V2TI. gcc/testsuite/ChangeLog PR target/105791 * gcc.target/i386/pr105791.c: New test case.
This should now be fixed on mainline. Sorry for the breakage.
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>: https://gcc.gnu.org/g:c4320bde42c6497b701e2e6b8f1c5069bed19818 commit r13-998-gc4320bde42c6497b701e2e6b8f1c5069bed19818 Author: Roger Sayle <roger@nextmovesoftware.com> Date: Tue Jun 7 07:49:40 2022 +0100 Recognize vpcmov in combine with -mxop on x86. By way of an apology for causing PR target/105791, where I'd overlooked the need to support V1TImode in TARGET_XOP's vpcmov instruction, this patch further improves support for TARGET_XOP's vpcmov instruction, by recognizing it in combine. Currently, the test case: typedef int v4si __attribute__ ((vector_size (16))); v4si foo(v4si c, v4si t, v4si f) { return (c&t)|(~c&f); } on x86_64 with -O2 -mxop generates: vpxor %xmm2, %xmm1, %xmm1 vpand %xmm0, %xmm1, %xmm1 vpxor %xmm2, %xmm1, %xmm0 ret but with this patch now generates: vpcmov %xmm0, %xmm2, %xmm1, %xmm0 ret On its own, the new combine splitter works fine on TARGET_64BIT, but alas with -m32 combine incorrectly thinks the replacement instruction is more expensive, as IF_THEN_ELSE isn't currently/correctly handled in ix86_rtx_costs. So to avoid the need for a target selector in the new tescase, I've updated ix86_rtx_costs to report that AMD's vpcmov has a latency of two cycles [it's now an obsolete instruction set extension and there's unlikely to ever be a processor where this instruction has a different timing], and while there I also added rtx_costs for x86_64's integer conditional move instructions (which have single cycle latency). 2022-06-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.cc (ix86_rtx_costs): Add a new case for IF_THEN_ELSE, and provide costs for TARGET_XOP's vpcmov and TARGET_CMOVE's (scalar integer) conditional moves. * config/i386/sse.md (define_split): Recognize XOP's vpcmov from its equivalent (canonical) pxor;pand;pxor sequence. gcc/testsuite/ChangeLog * gcc.target/i386/xop-pcmov3.c: New test case.