Consider this simple function, which yields mask fors non-zero elements: ---cat cmp.c--- #include <immintrin.h> int fun(__m512i x) { return _mm512_cmpeq_epi32_mask(x, _mm512_setzero_si512()); } ---eof $ gcc --version gcc (Debian 7.3.0-16) 7.3.0 $ gcc -O2 -S -mavx512f cmp.c && cat cmp.s fun: vpxord %zmm1, %zmm1, %zmm1 # <<< HERE vpcmpeqd %zmm1, %zmm0, %k1 # <<< kmovw %k1, %eax vzeroupper ret Also 8.1.0 generates the same code (as checked on godbolt.org). The pair of instructions VPXORD/VPCMPEQD can be replaced with single VPTESTMD %zmm0, %zmm0. VPTESTMD performs k1 := zmm0 AND zmm0, so to compare zmm0 with zeros it's sufficient.
Untested fix: --- gcc/config/i386/sse.md.jj 2018-05-21 13:15:43.478581765 +0200 +++ gcc/config/i386/sse.md 2018-05-21 14:15:00.523635533 +0200 @@ -11210,26 +11210,30 @@ "ix86_fixup_binary_operands_no_copy (EQ, <MODE>mode, operands);") (define_insn "<avx512>_eq<mode>3<mask_scalar_merge_name>_1" - [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=Yk") + [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=Yk,Yk") (unspec:<avx512fmaskmode> - [(match_operand:VI12_AVX512VL 1 "nonimmediate_operand" "%v") - (match_operand:VI12_AVX512VL 2 "nonimmediate_operand" "vm")] + [(match_operand:VI12_AVX512VL 1 "nonimmediate_operand" "%v,v") + (match_operand:VI12_AVX512VL 2 "vector_move_operand" "vm,C")] UNSPEC_MASKED_EQ))] - "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))" - "vpcmpeq<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}" + "TARGET_AVX512BW && !(MEM_P (operands[1]) && MEM_P (operands[2]))" + "@ + vpcmpeq<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2} + vptestm<ssemodesuffix>\t{%1, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %1}" [(set_attr "type" "ssecmp") (set_attr "prefix_extra" "1") (set_attr "prefix" "evex") (set_attr "mode" "<sseinsnmode>")]) (define_insn "<avx512>_eq<mode>3<mask_scalar_merge_name>_1" - [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=Yk") + [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=Yk,Yk") (unspec:<avx512fmaskmode> - [(match_operand:VI48_AVX512VL 1 "nonimmediate_operand" "%v") - (match_operand:VI48_AVX512VL 2 "nonimmediate_operand" "vm")] + [(match_operand:VI48_AVX512VL 1 "nonimmediate_operand" "%v,v") + (match_operand:VI48_AVX512VL 2 "vector_move_operand" "vm,C")] UNSPEC_MASKED_EQ))] "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))" - "vpcmpeq<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}" + "@ + vpcmpeq<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2} + vptestm<ssemodesuffix>\t{%1, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %1}" [(set_attr "type" "ssecmp") (set_attr "prefix_extra" "1") (set_attr "prefix" "evex")
Created attachment 44155 [details] gcc8-pr85832.patch Untested full patch.
Author: jakub Date: Fri May 25 12:36:03 2018 New Revision: 260756 URL: https://gcc.gnu.org/viewcvs?rev=260756&root=gcc&view=rev Log: PR target/85832 * config/i386/sse.md (<avx512>_eq<mode>3<mask_scalar_merge_name>_1): Add (=Yk,v,C) variant using vptestm insn. Use TARGET_AVX512BW in test instead of TARGET_AVX512F for VI12_AVX512VL iterator. * gcc.target/i386/avx512f-pr85832.c: New test. * gcc.target/i386/avx512vl-pr85832.c: New test. * gcc.target/i386/avx512bw-pr85832.c: New test. * gcc.target/i386/avx512vlbw-pr85832.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/avx512bw-pr85832.c trunk/gcc/testsuite/gcc.target/i386/avx512f-pr85832.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-pr85832.c trunk/gcc/testsuite/gcc.target/i386/avx512vlbw-pr85832.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog
Actually, vpcmpeq zmmX, zero and vptestm zmmX, zmmX perform exactly the opposite, not the same thing, as vpcmpeq sets bits in the %kY register if zmmX element is equal to 0, while vptestm if it is not equal to 0. So we want to use vptestnm instead.
Created attachment 44227 [details] gcc9-pr85832.patch Untested fix.
Author: jakub Date: Mon Jun 4 13:49:55 2018 New Revision: 261148 URL: https://gcc.gnu.org/viewcvs?rev=261148&root=gcc&view=rev Log: PR target/85832 PR target/86036 * config/i386/sse.md (<avx512>_eq<mode>3<mask_scalar_merge_name>_1): Use vptestnm rather than vptestm in (=Yc,v,C) variant. * gcc.target/i386/avx512f-pr85832.c: Expect vptestnm rather than vptestm. * gcc.target/i386/avx512vl-pr85832.c: Likewise. * gcc.target/i386/avx512vlbw-pr85832.c: Likewise. * gcc.target/i386/avx512bw-pr85832.c: Likewise. * gcc.target/i386/avx512bw-pr86036.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/avx512bw-pr86036.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/sse.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/avx512bw-pr85832.c trunk/gcc/testsuite/gcc.target/i386/avx512f-pr85832.c trunk/gcc/testsuite/gcc.target/i386/avx512vl-pr85832.c trunk/gcc/testsuite/gcc.target/i386/avx512vlbw-pr85832.c
Fixed.