This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PING^3 [PATCH] i386: Add pass_remove_partial_avx_dependency


On Tue, Sep 11, 2018 at 9:01 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu <hongjiu.lu@intel.com> wrote:
>>> With -mavx, for
>>>
>>> [hjl@gnu-cfl-1 skx-2]$ cat foo.i
>>> extern float f;
>>> extern double d;
>>> extern int i;
>>>
>>> void
>>> foo (void)
>>> {
>>>   d = f;
>>>   f = i;
>>> }
>>>
>>> we need to generate
>>>
>>>         vxorp[ds]       %xmmN, %xmmN, %xmmN
>>>         ...
>>>         vcvtss2sd       f(%rip), %xmmN, %xmmX
>>>         ...
>>>         vcvtsi2ss       i(%rip), %xmmN, %xmmY
>>>
>>> to avoid partial XMM register stall.  This patch adds a pass to generate
>>> a single
>>>
>>>         vxorps          %xmmN, %xmmN, %xmmN
>>>
>>> at function entry, which is shared by all SF and DF conversions, instead
>>> of generating one
>>>
>>>         vxorp[ds]       %xmmN, %xmmN, %xmmN
>>>
>>> for each SF/DF conversion.
>>>
>>> Performance impacts on SPEC CPU 2017 rate with 1 copy using
>>>
>>> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops
>>>
>>> are
>>>
>>> 1. On Broadwell server:
>>>
>>> 500.perlbench_r (-0.82%)
>>> 502.gcc_r (0.73%)
>>> 505.mcf_r (-0.24%)
>>> 520.omnetpp_r (-2.22%)
>>> 523.xalancbmk_r (-1.47%)
>>> 525.x264_r (0.31%)
>>> 531.deepsjeng_r (0.27%)
>>> 541.leela_r (0.85%)
>>> 548.exchange2_r (-0.11%)
>>> 557.xz_r (-0.34%)
>>> Geomean: (-0.23%)
>>>
>>> 503.bwaves_r (0.00%)
>>> 507.cactuBSSN_r (-1.88%)
>>> 508.namd_r (0.00%)
>>> 510.parest_r (-0.56%)
>>> 511.povray_r (0.49%)
>>> 519.lbm_r (-1.28%)
>>> 521.wrf_r (-0.28%)
>>> 526.blender_r (0.55%)
>>> 527.cam4_r (-0.20%)
>>> 538.imagick_r (2.52%)
>>> 544.nab_r (-0.18%)
>>> 549.fotonik3d_r (-0.51%)
>>> 554.roms_r (-0.22%)
>>> Geomean: (0.00%)
>>>
>>> 2. On Skylake client:
>>>
>>> 500.perlbench_r (-0.29%)
>>> 502.gcc_r (-0.36%)
>>> 505.mcf_r (1.77%)
>>> 520.omnetpp_r (-0.26%)
>>> 523.xalancbmk_r (-3.69%)
>>> 525.x264_r (-0.32%)
>>> 531.deepsjeng_r (0.00%)
>>> 541.leela_r (-0.46%)
>>> 548.exchange2_r (0.00%)
>>> 557.xz_r (0.00%)
>>> Geomean: (-0.34%)
>>>
>>> 503.bwaves_r (0.00%)
>>> 507.cactuBSSN_r (-0.56%)
>>> 508.namd_r (0.87%)
>>> 510.parest_r (0.00%)
>>> 511.povray_r (-0.73%)
>>> 519.lbm_r (0.84%)
>>> 521.wrf_r (0.00%)
>>> 526.blender_r (-0.81%)
>>> 527.cam4_r (-0.43%)
>>> 538.imagick_r (2.55%)
>>> 544.nab_r (0.28%)
>>> 549.fotonik3d_r (0.00%)
>>> 554.roms_r (0.32%)
>>> Geomean: (0.12%)
>>>
>>> 3. On Skylake server:
>>>
>>> 500.perlbench_r (-0.55%)
>>> 502.gcc_r (0.69%)
>>> 505.mcf_r (0.00%)
>>> 520.omnetpp_r (-0.33%)
>>> 523.xalancbmk_r (-0.21%)
>>> 525.x264_r (-0.27%)
>>> 531.deepsjeng_r (0.00%)
>>> 541.leela_r (0.00%)
>>> 548.exchange2_r (-0.11%)
>>> 557.xz_r (0.00%)
>>> Geomean: (0.00%)
>>>
>>> 503.bwaves_r (0.58%)
>>> 507.cactuBSSN_r (0.00%)
>>> 508.namd_r (0.00%)
>>> 510.parest_r (0.18%)
>>> 511.povray_r (-0.58%)
>>> 519.lbm_r (0.25%)
>>> 521.wrf_r (0.40%)
>>> 526.blender_r (0.34%)
>>> 527.cam4_r (0.19%)
>>> 538.imagick_r (5.87%)
>>> 544.nab_r (0.17%)
>>> 549.fotonik3d_r (0.00%)
>>> 554.roms_r (0.00%)
>>> Geomean: (0.62%)
>>>
>>> On Skylake client, impacts on 538.imagick_r are
>>>
>>> size before:
>>>
>>>    text    data     bss     dec     hex filename
>>> 2555577   10876    5576 2572029  273efd imagick_r.exe
>>>
>>> size after:
>>>
>>>    text    data     bss     dec     hex filename
>>> 2511825   10876    5576 2528277  269415 imagick_r.exe
>>>
>>> number of vxorp[ds]:
>>>
>>> before          after           difference
>>> 14570           4515            -69%
>>>
>>> OK for trunk?
>>>
>>> Thanks.
>>>
>>>
>>> H.J.
>>> ---
>>> gcc/
>>>
>>> 2018-08-28  H.J. Lu  <hongjiu.lu@intel.com>
>>>             Sunil K Pandey  <sunil.k.pandey@intel.com>
>>>
>>>         PR target/87007
>>>         * config/i386/i386-passes.def: Add
>>>         pass_remove_partial_avx_dependency.
>>>         * config/i386/i386-protos.h
>>>         (make_pass_remove_partial_avx_dependency): New.
>>>         * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
>>>         New function.
>>>         (pass_data_remove_partial_avx_dependency): New.
>>>         (pass_remove_partial_avx_dependency): Likewise.
>>>         (make_pass_remove_partial_avx_dependency): Likewise.
>>>         * config/i386/i386.md (SF/DF conversion splitters): Disabled
>>>         for TARGET_AVX.
>>>
>>> gcc/testsuite/
>>>
>>> 2018-08-28  H.J. Lu  <hongjiu.lu@intel.com>
>>>             Sunil K Pandey  <sunil.k.pandey@intel.com>
>>>
>>>         PR target/87007
>>>         * gcc.target/i386/pr87007.c: New file.
>>
>>
>> PING:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html
>>
>
> PING.
>

Hi Kirll, Jakub, Jan,

Can you take a look?

Thanks.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]