When AVX512F is enabled, the vzerouppoer optimization is disabled. This is intended only for Xeon Phi, not for Skylake server which also has AVX512F. Since AVX512ER is unique to Xeon Phi and will never appear in non Xeon Phi processors, the vzerouppoer optimization should be enabled when AVX512F is enabled, but AVX512ER isn't: [hjl@gnu-6 vzeroupper-skx-1]$ cat foo.c #include <immintrin.h> extern __m512d y, z; void foo () { z = y; } [hjl@gnu-6 vzeroupper-skx-1]$ make foo.s /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -mavx512f -mno-avx512er -O2 -S foo.c c[hjl@gnu-6 vzeroupper-skx-1]$ cat foo.s .file "foo.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: .LFB4897: .cfi_startproc vmovapd y(%rip), %zmm0 vmovapd %zmm0, z(%rip) ret .cfi_endproc .LFE4897: .size foo, .-foo .ident "GCC: (GNU) 8.0.0 20171110 (experimental)" .section .note.GNU-stack,"",@progbits [hjl@gnu-6 vzeroupper-skx-1]$
class pass_insert_vzeroupper : public rtl_opt_pass { public: pass_insert_vzeroupper(gcc::context *ctxt) : rtl_opt_pass(pass_data_insert_vzeroupper, ctxt) {} /* opt_pass methods: */ virtual bool gate (function *) { return TARGET_AVX && !TARGET_AVX512F && TARGET_VZEROUPPER && flag_expensive_optimizations && !optimize_size; } virtual unsigned int execute (function *) { return rest_of_handle_insert_vzeroupper (); } }; // class pass_insert_vzeroupper
Created attachment 42583 [details] An untested patch
(In reply to Uroš Bizjak from comment #1) > return TARGET_AVX && !TARGET_AVX512F Should !TARGET_AVX512F be changed to !TARGET_AVX152ER in gate function?
Created attachment 42584 [details] An untested patch
(In reply to Uroš Bizjak from comment #3) > (In reply to Uroš Bizjak from comment #1) > > return TARGET_AVX && !TARGET_AVX512F > > Should !TARGET_AVX512F be changed to !TARGET_AVX152ER in gate function? Yes, the untested patch is updated.
Patch has been sent: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01052.html
Author: speryt Date: Wed Nov 15 12:27:31 2017 New Revision: 254763 URL: https://gcc.gnu.org/viewcvs?rev=254763&root=gcc&view=rev Log: Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX. 2017-11-15 Sebastian Peryt <sebastian.peryt@intel.com> gcc/ PR target/82941 PR target/82942 * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition to return true on Xeon and not on Xeon Phi. (ix86_check_avx256_register): Changed to ... (ix86_check_avx_upper_register): ... this. Add extra check for VALID_AVX512F_REG_OR_XI_MODE. (ix86_avx_u128_mode_needed): Changed ix86_check_avx256_register to ix86_check_avx_upper_register. (ix86_check_avx256_stores): Changed to ... (ix86_check_avx_upper_stores): ... this. Changed ix86_check_avx256_register to ix86_check_avx_upper_register. (ix86_avx_u128_mode_after): Changed avx_reg256_found to avx_upper_reg_found. Changed ix86_check_avx256_stores to ix86_check_avx_upper_stores. (ix86_avx_u128_mode_entry): Changed ix86_check_avx256_register to ix86_check_avx_upper_register. (ix86_avx_u128_mode_exit): Ditto. * config/i386/i386.h: (host_detect_local_cpu): New define. 2017-11-15 Sebastian Peryt <sebastian.peryt@intel.com> gcc/testsuite/ PR target/82941 PR target/82942 * gcc.target/i386/pr82941-1.c: New test. * gcc.target/i386/pr82941-2.c: New test. * gcc.target/i386/pr82942-1.c: New test. * gcc.target/i386/pr82942-2.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr82941-1.c trunk/gcc/testsuite/gcc.target/i386/pr82941-2.c trunk/gcc/testsuite/gcc.target/i386/pr82942-1.c trunk/gcc/testsuite/gcc.target/i386/pr82942-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.h trunk/gcc/testsuite/ChangeLog
Author: speryt Date: Mon Dec 4 11:03:37 2017 New Revision: 255378 URL: https://gcc.gnu.org/viewcvs?rev=255378&root=gcc&view=rev Log: Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX. Add X86_TUNE_EMIT_VZEROUPPER to indicate if vzeroupper instruction should be inserted before a transfer of control flow out of the function. It is turned on by default unless we are tuning for KNL. Users can always use -mzeroupper or -mno-zeroupper to override X86_TUNE_EMIT_VZEROUPPER. 2017-12-04 Sebastian Peryt <sebastian.peryt@intel.com> H.J. Lu <hongjiu.lu@intel.com> gcc/ Bakcported from trunk PR target/82941 PR target/82942 PR target/82990 * config/i386/i386.c (pass_insert_vzeroupper): Remove TARGET_AVX512F check from gate condition. (ix86_check_avx256_register): Changed to ... (ix86_check_avx_upper_register): ... this. Add extra check for VALID_AVX512F_REG_OR_XI_MODE. (ix86_avx_u128_mode_needed): Changed ix86_check_avx256_register to ix86_check_avx_upper_register. (ix86_check_avx256_stores): Changed to ... (ix86_check_avx_upper_stores): ... this. Changed ix86_check_avx256_register to ix86_check_avx_upper_register. (ix86_avx_u128_mode_after): Changed avx_reg256_found to avx_upper_reg_found. Changed ix86_check_avx256_stores to ix86_check_avx_upper_stores. (ix86_avx_u128_mode_entry): Changed ix86_check_avx256_register to ix86_check_avx_upper_register. (ix86_avx_u128_mode_exit): Ditto. (ix86_option_override_internal): Set MASK_VZEROUPPER if neither -mzeroupper nor -mno-zeroupper is used and TARGET_EMIT_VZEROUPPER is set. * config/i386/i386.h: (host_detect_local_cpu): New define. (TARGET_EMIT_VZEROUPPER): New. * config/i386/x86-tune.def: Add X86_TUNE_EMIT_VZEROUPPER. 2017-12-04 Sebastian Peryt <sebastian.peryt@intel.com> H.J. Lu <hongjiu.lu@intel.com> gcc/testsuite/ Backported from trunk PR target/82941 PR target/82942 PR target/82990 * gcc.target/i386/pr82941-1.c: New test. * gcc.target/i386/pr82941-2.c: Likewise. * gcc.target/i386/pr82942-1.c: Likewise. * gcc.target/i386/pr82942-2.c: Likewise. * gcc.target/i386/pr82990-1.c: Likewise. * gcc.target/i386/pr82990-2.c: Likewise. * gcc.target/i386/pr82990-3.c: Likewise. * gcc.target/i386/pr82990-4.c: Likewise. * gcc.target/i386/pr82990-5.c: Likewise. * gcc.target/i386/pr82990-6.c: Likewise. * gcc.target/i386/pr82990-7.c: Likewise. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82941-1.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82941-2.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82942-1.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82942-2.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-1.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-2.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-3.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-4.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-5.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-6.c branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-7.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/i386/i386.c branches/gcc-7-branch/gcc/config/i386/i386.h branches/gcc-7-branch/gcc/config/i386/x86-tune.def branches/gcc-7-branch/gcc/testsuite/ChangeLog
Author: speryt Date: Mon Dec 4 11:40:44 2017 New Revision: 255379 URL: https://gcc.gnu.org/viewcvs?rev=255379&root=gcc&view=rev Log: Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX. Add X86_TUNE_EMIT_VZEROUPPER to indicate if vzeroupper instruction should be inserted before a transfer of control flow out of the function. It is turned on by default unless we are tuning for KNL. Users can always use -mzeroupper or -mno-zeroupper to override X86_TUNE_EMIT_VZEROUPPER. 2017-12-04 Sebastian Peryt <sebastian.peryt@intel.com> H.J. Lu <hongjiu.lu@intel.com> gcc/ Bakcported from trunk PR target/82941 PR target/82942 PR target/82990 * config/i386/i386.c (pass_insert_vzeroupper): Remove TARGET_AVX512F check from gate condition. (ix86_check_avx256_register): Changed to ... (ix86_check_avx_upper_register): ... this. Add extra check for VALID_AVX512F_REG_OR_XI_MODE. (ix86_avx_u128_mode_needed): Changed ix86_check_avx256_register to ix86_check_avx_upper_register. (ix86_check_avx256_stores): Changed to ... (ix86_check_avx_upper_stores): ... this. Changed ix86_check_avx256_register to ix86_check_avx_upper_register. (ix86_avx_u128_mode_after): Changed avx_reg256_found to avx_upper_reg_found. Changed ix86_check_avx256_stores to ix86_check_avx_upper_stores. (ix86_avx_u128_mode_entry): Changed ix86_check_avx256_register to ix86_check_avx_upper_register. (ix86_avx_u128_mode_exit): Ditto. (ix86_option_override_internal): Set MASK_VZEROUPPER if neither -mzeroupper nor -mno-zeroupper is used and TARGET_EMIT_VZEROUPPER is set. * config/i386/i386.h: (host_detect_local_cpu): New define. (TARGET_EMIT_VZEROUPPER): New. * config/i386/x86-tune.def: Add X86_TUNE_EMIT_VZEROUPPER. gcc/testsuite/ Backported from trunk PR target/82941 PR target/82942 PR target/82990 * gcc.target/i386/pr82941-1.c: New test. * gcc.target/i386/pr82941-2.c: Likewise. * gcc.target/i386/pr82942-1.c: Likewise. * gcc.target/i386/pr82942-2.c: Likewise. * gcc.target/i386/pr82990-1.c: Likewise. * gcc.target/i386/pr82990-2.c: Likewise. * gcc.target/i386/pr82990-3.c: Likewise. * gcc.target/i386/pr82990-4.c: Likewise. * gcc.target/i386/pr82990-5.c: Likewise. * gcc.target/i386/pr82990-6.c: Likewise. * gcc.target/i386/pr82990-7.c: Likewise. Added: branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82941-1.c branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82941-2.c branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82942-1.c branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82942-2.c branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-1.c branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-2.c branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-3.c branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-4.c branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-5.c branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-6.c branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-7.c Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/config/i386/i386.c branches/gcc-6-branch/gcc/config/i386/i386.h branches/gcc-6-branch/gcc/config/i386/x86-tune.def branches/gcc-6-branch/gcc/testsuite/ChangeLog
Fixed for GCC 8 and on GCC 6/7 branches.