This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation for SKX


Attached is fixed patch.

Sebastian


> -----Original Message-----
> From: H.J. Lu [mailto:hjl.tools@gmail.com]
> Sent: Tuesday, November 14, 2017 1:18 PM
> To: Peryt, Sebastian <sebastian.peryt@intel.com>
> Cc: Jakub Jelinek <jakub@redhat.com>; gcc-patches@gcc.gnu.org; Uros Bizjak
> <ubizjak@gmail.com>; Kirill Yukhin <kirill.yukhin@gmail.com>; Lu, Hongjiu
> <hongjiu.lu@intel.com>
> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper generation
> for SKX
> 
> On Tue, Nov 14, 2017 at 3:18 AM, Peryt, Sebastian <sebastian.peryt@intel.com>
> wrote:
> > I have updated tests and changelog according to Jakub's suggestions.
> > Please find attached v2 of my patch.
> >
> >
> > 14.11.2017  Sebastian Peryt  <sebastian.peryt@intel.com>
> >
> > gcc/
> >
> >         PR target/82941
> >         PR target/82942
> >         * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
> >         to return true on Xeon and not on Xeon Phi.
> >         (ix86_check_avx256_register): Changed to ...
> >         (ix86_check_avx_upper_register): ... this. Add extra check for
> >         VALID_AVX512F_REG_OR_XI_MODE.
> >         (ix86_avx_u128_mode_needed): Changed
> >         ix86_check_avx256_register to ix86_check_avx_upper_register.
> >         (ix86_check_avx256_stores): Changed to ...
> >         (ix86_check_avx_upper_stores): ... this. Changed
> >         ix86_check_avx256_register to ix86_check_avx_upper_register.
> >         (ix86_avx_u128_mode_after): Changed
> >         avx_reg256_found to avx_upper_reg_found. Changed
> >         ix86_check_avx256_stores to ix86_check_avx_upper_stores.
> >         (ix86_avx_u128_mode_entry): Changed
> >         ix86_check_avx256_register to ix86_check_avx_upper_register.
> >         (ix86_avx_u128_mode_exit): Ditto.
> >         * config/i386/i386.h: (host_detect_local_cpu): New define.
> 
> @@ -2497,7 +2497,7 @@ public:
>    /* opt_pass methods: */
>    virtual bool gate (function *)
>      {
> -      return TARGET_AVX && !TARGET_AVX512F
> +      return TARGET_AVX && !TARGET_AVX512PF && !TARGET_AVX512ER
>                                                 ^^^^^^^^^^^^^^^^^^^^^  Please remove  this.
> 
> From glibc commit:
> 
> commit 4cb334c4d6249686653137ec273d081371b3672d
> Author: H.J. Lu <hjl.tools@gmail.com>
> Date:   Tue Apr 18 14:01:45 2017 -0700
> 
>     x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]
> 
>     On Skylake server, AVX512 load/store instructions in memcpy/memset may
>     lead to lower CPU turbo frequency in certain situations.  Use of AVX2
>     in memcpy/memset has been observed to have improved overall performance
>     in many workloads due to the higher frequency.
> 
>     Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
>     if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
>     used on Skylake server.
> 
> Only AVX512ER is really unique to Xeon Phi.
> 
>        && TARGET_VZEROUPPER && flag_expensive_optimizations
>        && !optimize_size;
>      }
> 
> > 14.11.2017  Sebastian Peryt  <sebastian.peryt@intel.com>
> >
> > gcc/testsuite/
> >
> >         PR target/82941
> >         PR target/82942
> >         * gcc.target/i386/pr82941-1.c: New test.
> >         * gcc.target/i386/pr82941-2.c: New test.
> >         * gcc.target/i386/pr82942-1.c: New test.
> >         * gcc.target/i386/pr82942-2.c: New test.
> >
> >
> > Thanks,
> > Sebastian
> >
> >> -----Original Message-----
> >> From: Jakub Jelinek [mailto:jakub@redhat.com]
> >> Sent: Tuesday, November 14, 2017 10:51 AM
> >> To: Peryt, Sebastian <sebastian.peryt@intel.com>
> >> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubizjak@gmail.com>; Kirill
> >> Yukhin <kirill.yukhin@gmail.com>; Lu, Hongjiu <hongjiu.lu@intel.com>
> >> Subject: Re: [PATCH][i386] PR82941/PR82942 - Adding vzeroupper
> >> generation for SKX
> >>
> >> On Tue, Nov 14, 2017 at 09:45:12AM +0000, Peryt, Sebastian wrote:
> >> > Hi,
> >> >
> >> > This patch fixes PR82941 and PR82942 by adding vzeroupper
> >> > generation on
> >> SKX.
> >> > Bootstrapped and tested.
> >> >
> >> > 14.11.2017  Sebastian Peryt  <sebastian.peryt@intel.com>
> >> >
> >> > gcc/
> >>
> >> In that case the ChangeLog entry should list the PRs, i.e.
> >>       PR target/82941
> >>       PR target/82942
> >> >     * config/i386/i386.c (pass_insert_vzeroupper): Modify gate condition
> >> >     to return true on Xeon and not on Xeon Phi.
> >> >     (ix86_check_avx256_register): Changed to ...
> >> >     (ix86_check_avx_upper_register): ... this.
> >> >     (ix86_check_avx_upper_register): Add extra check for
> >> >     VALID_AVX512F_REG_OR_XI_MODE.
> >>
> >> The way this is usually written is instead:
> >>       (ix86_check_avx256_register): Changed to ...
> >>       (ix86_check_avx_upper_register): ... this.  Add extra check for
> >>       VALID_AVX512F_REG_OR_XI_MODE.
> >> i.e. don't duplicate the function name, just continue mentioning further
> changes.
> >>
> >> >     (ix86_avx_u128_mode_needed): Changed
> >> >     ix86_check_avx256_register to ix86_check_avx_upper_register.
> >> >     (ix86_check_avx256_stores): Changed to ...
> >> >     (ix86_check_avx_upper_stores): ... this.
> >> >     (ix86_check_avx_upper_stores): Changed
> >> >     ix86_check_avx256_register to ix86_check_avx_upper_register.
> >>
> >> Likewise.
> >>
> >> > gcc/testsuite/
> >> >     * gcc.target/i386/pr82941.c: New test.
> >> >     * gcc.target/i386/pr82942.c: New test.
> >>
> >> Shouldn't there be also a test that if using -march=knl and another
> >> one if using - mavx512f -mavx512er that we don't emit any vzeroupper?
> >>
> >>       Jakub
> 
> 
> 
> --
> H.J.

Attachment: 0001-VZEROUPPER_v3.patch
Description: 0001-VZEROUPPER_v3.patch


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]