Created attachment 43902 [details] reduced testcase Compiler output: $ x86_64-pc-linux-gnu-gcc -O3 -fno-caller-saves -mavx512f testcase.c /tmp/ccNXhuu3.s: Assembler messages: /tmp/ccNXhuu3.s:418: Error: unsupported instruction `vpand' The failing instruction is: ... vpand %ymm16, %ymm1, %ymm1 ... I tried both most recent GNU as and NASM (both from most recent GIT), but neither accept this form. Accessing ymm >= 16 is allowed only with the EVEX-prefixed VPANDD or VPANDQ. I am really not an expert in AVX512 instruction encoding, but gcc seems to be wrong here according to the assemblers I tested and Intel's SDM vol.2: only vpandd/vpandq are using the EVEX prefix, and you can use ymm >= 16 only with the EVEX prefix. (this can be generalized to other instructions, such as vpor, and maybe to other registers, such as xmm/zmm) $ x86_64-pc-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-259207-checking-yes-rtl-df-extra-nobootstrap-pr85177-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/8.0.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --disable-bootstrap --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-259207-checking-yes-rtl-df-extra-nobootstrap-pr85177-amd64 Thread model: posix gcc version 8.0.1 20180407 (experimental) (GCC)
Started with r250759. Debugging.
Created attachment 43907 [details] gcc8-pr85328.patch Many patterns rely on ix86_hard_regno_mode_ok not allowing < 512-bit vector modes in xmm16+ registers. Unfortunately, the vec_extract_lo_* splitters provide a loophole for this, by creating e.g. on this testcase V32QImode xmm16 hard register which then is propagated into the vpand. The patch fixes that by avoiding that, essentially forcing the low half or quarter vector extraction from the zmm16+ registers to be a 512-bit move into the other register (which must be necessarily < xmm16.
Author: jakub Date: Thu Apr 12 11:17:23 2018 New Revision: 259344 URL: https://gcc.gnu.org/viewcvs?rev=259344&root=gcc&view=rev Log: PR target/85328 * config/i386/sse.md (<mask_codefor>avx512dq_vextract<shuffletype>64x2_1<mask_name> split, <mask_codefor>avx512f_vextract<shuffletype>32x4_1<mask_name> split, vec_extract_lo_<mode><mask_name> split, vec_extract_lo_v32hi, vec_extract_lo_v64qi): For non-AVX512VL if input is xmm16+ reg and output is a reg, avoid creating invalid lowpart subreg, but instead split into a 512-bit move. Don't split if not AVX512VL, input is xmm16+ reg and output is a mem. (vec_extract_lo_<mode><mask_name>, vec_extract_lo_v32hi, vec_extract_lo_v64qi): Don't require split if not AVX512VL, input is xmm16+ reg and output is a mem. * gcc.target/i386/pr85328.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr85328.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
Fixed.