Bug 85832 - [AVX512] possible shorter code when comparing with vector of zeros
Summary: [AVX512] possible shorter code when comparing with vector of zeros
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.3.0
: P3 normal
Target Milestone: ---
Assignee: Jakub Jelinek
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-05-18 16:32 UTC by Wojciech Mula
Modified: 2018-06-04 13:52 UTC (History)
2 users (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2018-05-21 00:00:00


Attachments
gcc8-pr85832.patch (1.18 KB, patch)
2018-05-21 13:37 UTC, Jakub Jelinek
Details | Diff
gcc9-pr85832.patch (1.21 KB, patch)
2018-06-04 09:52 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Wojciech Mula 2018-05-18 16:32:21 UTC
Consider this simple function, which yields mask fors non-zero elements:

---cat cmp.c---
#include <immintrin.h>

int fun(__m512i x) {
    return _mm512_cmpeq_epi32_mask(x, _mm512_setzero_si512());
}
---eof

$ gcc --version
gcc (Debian 7.3.0-16) 7.3.0

$ gcc -O2 -S -mavx512f cmp.c && cat cmp.s
fun:
	vpxord	%zmm1, %zmm1, %zmm1     # <<< HERE
	vpcmpeqd	%zmm1, %zmm0, %k1   # <<<
	kmovw	%k1, %eax
	vzeroupper
	ret

Also 8.1.0 generates the same code (as checked on godbolt.org).

The pair of instructions VPXORD/VPCMPEQD can be replaced with single
VPTESTMD %zmm0, %zmm0.  VPTESTMD performs k1 := zmm0 AND zmm0, so to
compare zmm0 with zeros it's sufficient.
Comment 1 Jakub Jelinek 2018-05-21 12:22:44 UTC
Untested fix:
--- gcc/config/i386/sse.md.jj	2018-05-21 13:15:43.478581765 +0200
+++ gcc/config/i386/sse.md	2018-05-21 14:15:00.523635533 +0200
@@ -11210,26 +11210,30 @@
   "ix86_fixup_binary_operands_no_copy (EQ, <MODE>mode, operands);")
 
 (define_insn "<avx512>_eq<mode>3<mask_scalar_merge_name>_1"
-  [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=Yk")
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=Yk,Yk")
 	(unspec:<avx512fmaskmode>
-	  [(match_operand:VI12_AVX512VL 1 "nonimmediate_operand" "%v")
-	   (match_operand:VI12_AVX512VL 2 "nonimmediate_operand" "vm")]
+	  [(match_operand:VI12_AVX512VL 1 "nonimmediate_operand" "%v,v")
+	   (match_operand:VI12_AVX512VL 2 "vector_move_operand" "vm,C")]
 	  UNSPEC_MASKED_EQ))]
-  "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "vpcmpeq<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}"
+  "TARGET_AVX512BW && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "@
+   vpcmpeq<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}
+   vptestm<ssemodesuffix>\t{%1, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %1}"
   [(set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<sseinsnmode>")])
 
 (define_insn "<avx512>_eq<mode>3<mask_scalar_merge_name>_1"
-  [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=Yk")
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand" "=Yk,Yk")
 	(unspec:<avx512fmaskmode>
-	  [(match_operand:VI48_AVX512VL 1 "nonimmediate_operand" "%v")
-	   (match_operand:VI48_AVX512VL 2 "nonimmediate_operand" "vm")]
+	  [(match_operand:VI48_AVX512VL 1 "nonimmediate_operand" "%v,v")
+	   (match_operand:VI48_AVX512VL 2 "vector_move_operand" "vm,C")]
 	  UNSPEC_MASKED_EQ))]
   "TARGET_AVX512F && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "vpcmpeq<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}"
+  "@
+   vpcmpeq<ssemodesuffix>\t{%2, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %2}
+   vptestm<ssemodesuffix>\t{%1, %1, %0<mask_scalar_merge_operand3>|%0<mask_scalar_merge_operand3>, %1, %1}"
   [(set_attr "type" "ssecmp")
    (set_attr "prefix_extra" "1")
    (set_attr "prefix" "evex")
Comment 2 Jakub Jelinek 2018-05-21 13:37:36 UTC
Created attachment 44155 [details]
gcc8-pr85832.patch

Untested full patch.
Comment 3 Jakub Jelinek 2018-05-25 12:36:36 UTC
Author: jakub
Date: Fri May 25 12:36:03 2018
New Revision: 260756

URL: https://gcc.gnu.org/viewcvs?rev=260756&root=gcc&view=rev
Log:
	PR target/85832
	* config/i386/sse.md (<avx512>_eq<mode>3<mask_scalar_merge_name>_1):
	Add (=Yk,v,C) variant using vptestm insn.  Use TARGET_AVX512BW
	in test instead of TARGET_AVX512F for VI12_AVX512VL iterator.

	* gcc.target/i386/avx512f-pr85832.c: New test.
	* gcc.target/i386/avx512vl-pr85832.c: New test.
	* gcc.target/i386/avx512bw-pr85832.c: New test.
	* gcc.target/i386/avx512vlbw-pr85832.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/avx512bw-pr85832.c
    trunk/gcc/testsuite/gcc.target/i386/avx512f-pr85832.c
    trunk/gcc/testsuite/gcc.target/i386/avx512vl-pr85832.c
    trunk/gcc/testsuite/gcc.target/i386/avx512vlbw-pr85832.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
Comment 4 Jakub Jelinek 2018-06-04 08:29:24 UTC
Actually, vpcmpeq zmmX, zero and vptestm zmmX, zmmX perform exactly the opposite, not the same thing, as vpcmpeq sets bits in the %kY register if zmmX element is equal to 0, while vptestm if it is not equal to 0.
So we want to use vptestnm instead.
Comment 5 Jakub Jelinek 2018-06-04 09:52:43 UTC
Created attachment 44227 [details]
gcc9-pr85832.patch

Untested fix.
Comment 6 Jakub Jelinek 2018-06-04 13:50:27 UTC
Author: jakub
Date: Mon Jun  4 13:49:55 2018
New Revision: 261148

URL: https://gcc.gnu.org/viewcvs?rev=261148&root=gcc&view=rev
Log:
	PR target/85832
	PR target/86036
	* config/i386/sse.md (<avx512>_eq<mode>3<mask_scalar_merge_name>_1):
	Use vptestnm rather than vptestm in (=Yc,v,C) variant.

	* gcc.target/i386/avx512f-pr85832.c: Expect vptestnm rather than
	vptestm.
	* gcc.target/i386/avx512vl-pr85832.c: Likewise.
	* gcc.target/i386/avx512vlbw-pr85832.c: Likewise.
	* gcc.target/i386/avx512bw-pr85832.c: Likewise.
	* gcc.target/i386/avx512bw-pr86036.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/avx512bw-pr86036.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/avx512bw-pr85832.c
    trunk/gcc/testsuite/gcc.target/i386/avx512f-pr85832.c
    trunk/gcc/testsuite/gcc.target/i386/avx512vl-pr85832.c
    trunk/gcc/testsuite/gcc.target/i386/avx512vlbw-pr85832.c
Comment 7 Jakub Jelinek 2018-06-04 13:52:28 UTC
Fixed.