Bug 82942 - Generate vzeroupper with -mavx512f -mno-avx512er -O2
Summary: Generate vzeroupper with -mavx512f -mno-avx512er -O2
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 8.0
: P3 normal
Target Milestone: 8.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 82941
  Show dependency treegraph
 
Reported: 2017-11-10 17:09 UTC by H.J. Lu
Modified: 2018-01-18 15:28 UTC (History)
1 user (show)

See Also:
Host:
Target: x86_64-*-*, i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-11-10 00:00:00


Attachments
An untested patch (1.45 KB, patch)
2017-11-10 17:38 UTC, H.J. Lu
Details | Diff
An untested patch (1.44 KB, patch)
2017-11-10 17:50 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2017-11-10 17:09:11 UTC
When AVX512F is enabled, the vzerouppoer optimization is disabled.
This is intended only for Xeon Phi, not for Skylake server which
also has AVX512F.  Since AVX512ER is unique to Xeon Phi and will
never appear in non Xeon Phi processors, the vzerouppoer optimization
should be enabled when AVX512F is enabled, but AVX512ER isn't:



[hjl@gnu-6 vzeroupper-skx-1]$ cat foo.c
#include <immintrin.h>

extern __m512d y, z;

void
foo ()
{
  z = y;
}
[hjl@gnu-6 vzeroupper-skx-1]$ make foo.s
/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -mavx512f -mno-avx512er -O2   -S foo.c
c[hjl@gnu-6 vzeroupper-skx-1]$ cat foo.s
	.file	"foo.c"
	.text
	.p2align 4,,15
	.globl	foo
	.type	foo, @function
foo:
.LFB4897:
	.cfi_startproc
	vmovapd	y(%rip), %zmm0
	vmovapd	%zmm0, z(%rip)
	ret
	.cfi_endproc
.LFE4897:
	.size	foo, .-foo
	.ident	"GCC: (GNU) 8.0.0 20171110 (experimental)"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-6 vzeroupper-skx-1]$
Comment 1 Uroš Bizjak 2017-11-10 17:37:43 UTC
class pass_insert_vzeroupper : public rtl_opt_pass
{
public:
  pass_insert_vzeroupper(gcc::context *ctxt)
    : rtl_opt_pass(pass_data_insert_vzeroupper, ctxt)
  {}

  /* opt_pass methods: */
  virtual bool gate (function *)
    {
      return TARGET_AVX && !TARGET_AVX512F
	     && TARGET_VZEROUPPER && flag_expensive_optimizations
	     && !optimize_size;
    }

  virtual unsigned int execute (function *)
    {
      return rest_of_handle_insert_vzeroupper ();
    }

}; // class pass_insert_vzeroupper
Comment 2 H.J. Lu 2017-11-10 17:38:05 UTC
Created attachment 42583 [details]
An untested patch
Comment 3 Uroš Bizjak 2017-11-10 17:40:38 UTC
(In reply to Uroš Bizjak from comment #1)
>       return TARGET_AVX && !TARGET_AVX512F

Should !TARGET_AVX512F be changed to !TARGET_AVX152ER in gate function?
Comment 4 H.J. Lu 2017-11-10 17:50:05 UTC
Created attachment 42584 [details]
An untested patch
Comment 5 H.J. Lu 2017-11-10 17:50:34 UTC
(In reply to Uroš Bizjak from comment #3)
> (In reply to Uroš Bizjak from comment #1)
> >       return TARGET_AVX && !TARGET_AVX512F
> 
> Should !TARGET_AVX512F be changed to !TARGET_AVX152ER in gate function?

Yes, the untested patch is updated.
Comment 6 Sebastian Peryt 2017-11-14 11:50:56 UTC
Patch has been sent: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01052.html
Comment 7 Sebastian Peryt 2017-11-15 12:28:03 UTC
Author: speryt
Date: Wed Nov 15 12:27:31 2017
New Revision: 254763

URL: https://gcc.gnu.org/viewcvs?rev=254763&root=gcc&view=rev
Log:
Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX. 

2017-11-15  Sebastian Peryt  <sebastian.peryt@intel.com>

gcc/

	PR target/82941
	PR target/82942
	* config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
	to return true on Xeon and not on Xeon Phi.
	(ix86_check_avx256_register): Changed to ...
	(ix86_check_avx_upper_register): ... this. Add extra check for
	VALID_AVX512F_REG_OR_XI_MODE.
	(ix86_avx_u128_mode_needed): Changed
	ix86_check_avx256_register to ix86_check_avx_upper_register.
	(ix86_check_avx256_stores): Changed to ...
	(ix86_check_avx_upper_stores): ... this. Changed
	ix86_check_avx256_register to ix86_check_avx_upper_register.
	(ix86_avx_u128_mode_after): Changed
	avx_reg256_found to avx_upper_reg_found. Changed
	ix86_check_avx256_stores to ix86_check_avx_upper_stores.
	(ix86_avx_u128_mode_entry): Changed
	ix86_check_avx256_register to ix86_check_avx_upper_register.
	(ix86_avx_u128_mode_exit): Ditto.
	* config/i386/i386.h: (host_detect_local_cpu): New define.

2017-11-15  Sebastian Peryt  <sebastian.peryt@intel.com>
	
gcc/testsuite/

	PR target/82941
	PR target/82942
	* gcc.target/i386/pr82941-1.c: New test.
	* gcc.target/i386/pr82941-2.c: New test.
	* gcc.target/i386/pr82942-1.c: New test.
	* gcc.target/i386/pr82942-2.c: New test.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr82941-1.c
    trunk/gcc/testsuite/gcc.target/i386/pr82941-2.c
    trunk/gcc/testsuite/gcc.target/i386/pr82942-1.c
    trunk/gcc/testsuite/gcc.target/i386/pr82942-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.h
    trunk/gcc/testsuite/ChangeLog
Comment 8 Sebastian Peryt 2017-12-04 11:04:10 UTC
Author: speryt
Date: Mon Dec  4 11:03:37 2017
New Revision: 255378

URL: https://gcc.gnu.org/viewcvs?rev=255378&root=gcc&view=rev
Log:
Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX.
Add X86_TUNE_EMIT_VZEROUPPER to indicate if vzeroupper instruction should
be inserted before a transfer of control flow out of the function.  It is
turned on by default unless we are tuning for KNL.  Users can always use
-mzeroupper or -mno-zeroupper to override X86_TUNE_EMIT_VZEROUPPER.

2017-12-04  Sebastian Peryt  <sebastian.peryt@intel.com>
	H.J. Lu  <hongjiu.lu@intel.com>

gcc/
	Bakcported from trunk
	PR target/82941
	PR target/82942
	PR target/82990
	* config/i386/i386.c (pass_insert_vzeroupper): Remove
	TARGET_AVX512F check from gate condition.
	(ix86_check_avx256_register): Changed to ...
	(ix86_check_avx_upper_register): ... this. Add extra check for
	VALID_AVX512F_REG_OR_XI_MODE.
	(ix86_avx_u128_mode_needed): Changed
	ix86_check_avx256_register to ix86_check_avx_upper_register.
	(ix86_check_avx256_stores): Changed to ...
	(ix86_check_avx_upper_stores): ... this. Changed
	ix86_check_avx256_register to ix86_check_avx_upper_register.
	(ix86_avx_u128_mode_after): Changed
	avx_reg256_found to avx_upper_reg_found. Changed
	ix86_check_avx256_stores to ix86_check_avx_upper_stores.
	(ix86_avx_u128_mode_entry): Changed
	ix86_check_avx256_register to ix86_check_avx_upper_register.
	(ix86_avx_u128_mode_exit): Ditto.
	(ix86_option_override_internal): Set MASK_VZEROUPPER if
	neither -mzeroupper nor -mno-zeroupper is used and
	TARGET_EMIT_VZEROUPPER is set.
	* config/i386/i386.h: (host_detect_local_cpu): New define.
	(TARGET_EMIT_VZEROUPPER): New.
	* config/i386/x86-tune.def: Add X86_TUNE_EMIT_VZEROUPPER.

2017-12-04  Sebastian Peryt  <sebastian.peryt@intel.com>
	H.J. Lu  <hongjiu.lu@intel.com>

gcc/testsuite/
	Backported from trunk
	PR target/82941
	PR target/82942
	PR target/82990
	* gcc.target/i386/pr82941-1.c: New test.
	* gcc.target/i386/pr82941-2.c: Likewise.
	* gcc.target/i386/pr82942-1.c: Likewise.
	* gcc.target/i386/pr82942-2.c: Likewise.
	* gcc.target/i386/pr82990-1.c: Likewise.
	* gcc.target/i386/pr82990-2.c: Likewise.
	* gcc.target/i386/pr82990-3.c: Likewise.
	* gcc.target/i386/pr82990-4.c: Likewise.
	* gcc.target/i386/pr82990-5.c: Likewise.
	* gcc.target/i386/pr82990-6.c: Likewise.
	* gcc.target/i386/pr82990-7.c: Likewise.

Added:
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82941-1.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82941-2.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82942-1.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82942-2.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-1.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-2.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-3.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-4.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-5.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-6.c
    branches/gcc-7-branch/gcc/testsuite/gcc.target/i386/pr82990-7.c
Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/config/i386/i386.c
    branches/gcc-7-branch/gcc/config/i386/i386.h
    branches/gcc-7-branch/gcc/config/i386/x86-tune.def
    branches/gcc-7-branch/gcc/testsuite/ChangeLog
Comment 9 Sebastian Peryt 2017-12-04 11:41:16 UTC
Author: speryt
Date: Mon Dec  4 11:40:44 2017
New Revision: 255379

URL: https://gcc.gnu.org/viewcvs?rev=255379&root=gcc&view=rev
Log:
Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX.
Add X86_TUNE_EMIT_VZEROUPPER to indicate if vzeroupper instruction should
be inserted before a transfer of control flow out of the function.  It is
turned on by default unless we are tuning for KNL.  Users can always use
-mzeroupper or -mno-zeroupper to override X86_TUNE_EMIT_VZEROUPPER.

2017-12-04  Sebastian Peryt  <sebastian.peryt@intel.com>
	H.J. Lu  <hongjiu.lu@intel.com>

gcc/
	Bakcported from trunk
	PR target/82941
	PR target/82942
	PR target/82990
	* config/i386/i386.c (pass_insert_vzeroupper): Remove
	TARGET_AVX512F check from gate condition.
	(ix86_check_avx256_register): Changed to ...
	(ix86_check_avx_upper_register): ... this. Add extra check for
	VALID_AVX512F_REG_OR_XI_MODE.
	(ix86_avx_u128_mode_needed): Changed
	ix86_check_avx256_register to ix86_check_avx_upper_register.
	(ix86_check_avx256_stores): Changed to ...
	(ix86_check_avx_upper_stores): ... this. Changed
	ix86_check_avx256_register to ix86_check_avx_upper_register.
	(ix86_avx_u128_mode_after): Changed
	avx_reg256_found to avx_upper_reg_found. Changed
	ix86_check_avx256_stores to ix86_check_avx_upper_stores.
	(ix86_avx_u128_mode_entry): Changed
	ix86_check_avx256_register to ix86_check_avx_upper_register.
	(ix86_avx_u128_mode_exit): Ditto.
	(ix86_option_override_internal): Set MASK_VZEROUPPER if
	neither -mzeroupper nor -mno-zeroupper is used and
	TARGET_EMIT_VZEROUPPER is set.
	* config/i386/i386.h: (host_detect_local_cpu): New define.
	(TARGET_EMIT_VZEROUPPER): New.
	* config/i386/x86-tune.def: Add X86_TUNE_EMIT_VZEROUPPER.

gcc/testsuite/
	Backported from trunk
	PR target/82941
	PR target/82942
	PR target/82990
	* gcc.target/i386/pr82941-1.c: New test.
	* gcc.target/i386/pr82941-2.c: Likewise.
	* gcc.target/i386/pr82942-1.c: Likewise.
	* gcc.target/i386/pr82942-2.c: Likewise.
	* gcc.target/i386/pr82990-1.c: Likewise.
	* gcc.target/i386/pr82990-2.c: Likewise.
	* gcc.target/i386/pr82990-3.c: Likewise.
	* gcc.target/i386/pr82990-4.c: Likewise.
	* gcc.target/i386/pr82990-5.c: Likewise.
	* gcc.target/i386/pr82990-6.c: Likewise.
	* gcc.target/i386/pr82990-7.c: Likewise.

Added:
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82941-1.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82941-2.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82942-1.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82942-2.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-1.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-2.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-3.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-4.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-5.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-6.c
    branches/gcc-6-branch/gcc/testsuite/gcc.target/i386/pr82990-7.c
Modified:
    branches/gcc-6-branch/gcc/ChangeLog
    branches/gcc-6-branch/gcc/config/i386/i386.c
    branches/gcc-6-branch/gcc/config/i386/i386.h
    branches/gcc-6-branch/gcc/config/i386/x86-tune.def
    branches/gcc-6-branch/gcc/testsuite/ChangeLog
Comment 10 H.J. Lu 2018-01-18 15:28:17 UTC
Fixed for GCC 8 and on GCC 6/7 branches.