This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: "Peryt, Sebastian" <sebastian dot peryt at intel dot com>
- Cc: Jakub Jelinek <jakub at redhat dot com>, "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>, Uros Bizjak <ubizjak at gmail dot com>, Kirill Yukhin <kirill dot yukhin at gmail dot com>, "Lu, Hongjiu" <hongjiu dot lu at intel dot com>
- Date: Tue, 14 Nov 2017 04:17:31 -0800
- Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX
- Authentication-results: sourceware.org; auth=none
- References: <17623B198193D741876BD81A6E3AE5AD3C5938E5@irsmsx111.ger.corp.intel.com> <20171114095044.GF14653@tucnak> <17623B198193D741876BD81A6E3AE5AD3C5939C4@irsmsx111.ger.corp.intel.com>
On Tue, Nov 14, 2017 at 3:18 AM, Peryt, Sebastian
<sebastian.peryt@intel.com> wrote:
> I have updated tests and changelog according to Jakub's suggestions.
> Please find attached v2 of my patch.
>
>
> 14.11.2017 Sebastian Peryt <sebastian.peryt@intel.com>
>
> gcc/
>
> PR target/82941
> PR target/82942
> * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
> to return true on Xeon and not on Xeon Phi.
> (ix86_check_avx256_register): Changed to ...
> (ix86_check_avx_upper_register): ... this. Add extra check for
> VALID_AVX512F_REG_OR_XI_MODE.
> (ix86_avx_u128_mode_needed): Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_check_avx256_stores): Changed to ...
> (ix86_check_avx_upper_stores): ... this. Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_avx_u128_mode_after): Changed
> avx_reg256_found to avx_upper_reg_found. Changed
> ix86_check_avx256_stores to ix86_check_avx_upper_stores.
> (ix86_avx_u128_mode_entry): Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_avx_u128_mode_exit): Ditto.
> * config/i386/i386.h: (host_detect_local_cpu): New define.
@@ -2497,7 +2497,7 @@ public:
/* opt_pass methods: */
virtual bool gate (function *)
{
- return TARGET_AVX && !TARGET_AVX512F
+ return TARGET_AVX && !TARGET_AVX512PF && !TARGET_AVX512ER
^^^^^^^^^^^^^^^^^^^^^
Please remove this.
>From glibc commit:
commit 4cb334c4d6249686653137ec273d081371b3672d
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Tue Apr 18 14:01:45 2017 -0700
x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]
On Skylake server, AVX512 load/store instructions in memcpy/memset may
lead to lower CPU turbo frequency in certain situations. Use of AVX2
in memcpy/memset has been observed to have improved overall performance
in many workloads due to the higher frequency.
Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
used on Skylake server.
Only AVX512ER is really unique to Xeon Phi.
&& TARGET_VZEROUPPER && flag_expensive_optimizations
&& !optimize_size;
}
> 14.11.2017 Sebastian Peryt <sebastian.peryt@intel.com>
>
> gcc/testsuite/
>
> PR target/82941
> PR target/82942
> * gcc.target/i386/pr82941-1.c: New test.
> * gcc.target/i386/pr82941-2.c: New test.
> * gcc.target/i386/pr82942-1.c: New test.
> * gcc.target/i386/pr82942-2.c: New test.
>
>
> Thanks,
> Sebastian
>
>> -----Original Message-----
>> From: Jakub Jelinek [mailto:jakub@redhat.com]
>> Sent: Tuesday, November 14, 2017 10:51 AM
>> To: Peryt, Sebastian <sebastian.peryt@intel.com>
>> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubizjak@gmail.com>; Kirill Yukhin
>> <kirill.yukhin@gmail.com>; Lu, Hongjiu <hongjiu.lu@intel.com>
>> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation
>> for SKX
>>
>> On Tue, Nov 14, 2017 at 09:45:12AM +0000, Peryt, Sebastian wrote:
>> > Hi,
>> >
>> > This patch fixes PR82941 and PR82942 by adding vzeroupper generation on
>> SKX.
>> > Bootstrapped and tested.
>> >
>> > 14.11.2017 Sebastian Peryt <sebastian.peryt@intel.com>
>> >
>> > gcc/
>>
>> In that case the ChangeLog entry should list the PRs, i.e.
>> PR target/82941
>> PR target/82942
>> > * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
>> > to return true on Xeon and not on Xeon Phi.
>> > (ix86_check_avx256_register): Changed to ...
>> > (ix86_check_avx_upper_register): ... this.
>> > (ix86_check_avx_upper_register): Add extra check for
>> > VALID_AVX512F_REG_OR_XI_MODE.
>>
>> The way this is usually written is instead:
>> (ix86_check_avx256_register): Changed to ...
>> (ix86_check_avx_upper_register): ... this. Add extra check for
>> VALID_AVX512F_REG_OR_XI_MODE.
>> i.e. don't duplicate the function name, just continue mentioning further changes.
>>
>> > (ix86_avx_u128_mode_needed): Changed
>> > ix86_check_avx256_register to ix86_check_avx_upper_register.
>> > (ix86_check_avx256_stores): Changed to ...
>> > (ix86_check_avx_upper_stores): ... this.
>> > (ix86_check_avx_upper_stores): Changed
>> > ix86_check_avx256_register to ix86_check_avx_upper_register.
>>
>> Likewise.
>>
>> > gcc/testsuite/
>> > * gcc.target/i386/pr82941.c: New test.
>> > * gcc.target/i386/pr82942.c: New test.
>>
>> Shouldn't there be also a test that if using -march=knl and another one if using -
>> mavx512f -mavx512er that we don't emit any vzeroupper?
>>
>> Jakub
--
H.J.