This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PING^1: [PATCH GCC 8] x86: Re-enable partial_reg_dependency and movx for Haswell


On Wed, May 30, 2018 at 5:43 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sun, May 20, 2018 at 11:51 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>> r254152 disabled partial_reg_dependency and movx for Haswell and newer
>>> Intel processors.  r258972 restored them for skylake-avx512.  For Haswell,
>>> movx improves performance.  But partial_reg_stall may be better than
>>> partial_reg_dependency in theory.  We will investigate performance impact
>>> of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9.  In
>>> the meantime, this patch restores both partial_reg_dependency and mox for
>>> Haswell in GCC 8.
>>>
>>> OK for GCC 8?
>>
>> I would still like to know in what situations/bechnarks it improves the performance.
>> The change was benchmarked on spec2000/2006 plus some additional benchmarks and, so
>> it would be nice to know where it hurts.
>
> From
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829#c5:
>
> I have made measurements on HSW comparing
> -mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell to
> -Ofast -mtune=haswell and I see improvements on EEMBC benchmarks.
>
> automotive
> =========
>   aifftr01 (default) - goodperf: Runtime improvement of   2.6% (time).
>   aiifft01 (default) - goodperf: Runtime improvement of   2.2% (time).
>
> networking
> =========
>   ip_pktcheckb1m (default) - goodperf: Runtime improvement of   3.8% (time).
>   ip_pktcheckb2m (default) - goodperf: Runtime improvement of   5.2% (time).
>   ip_pktcheckb4m (default) - goodperf: Runtime improvement of   4.4% (time).
>   ip_pktcheckb512k (default) - goodperf: Runtime improvement of   4.2% (time).
>
> telecom
> =========
>   fft00data_1 (default) - goodperf: Runtime improvement of   8.4% (time).
>   fft00data_2 (default) - goodperf: Runtime improvement of   8.6% (time).
>   fft00data_3 (default) - goodperf: Runtime improvement of   9.0% (time).
>
> OK for GCC 8?
>

This is the patch I am going to check into GCC 8.

-- 
H.J.
From 9ecbfa1fd04dc4370a9ec4f3d56189cc07aee668 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Thu, 17 May 2018 09:52:09 -0700
Subject: [PATCH] x86: Re-enable partial_reg_dependency and movx for Haswell

r254152 disabled partial_reg_dependency and movx for Haswell and newer
Intel processors.  r258972 restored them for skylake-avx512.  For Haswell,
movx improves performance.  But partial_reg_stall may be better than
partial_reg_dependency in theory.  We will investigate performance impact
of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9.  In
the meantime, this patch restores both partial_reg_dependency and mox for
Haswell in GCC 8.

On Haswell, improvements for EEMBC benchmarks with

-mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell

vs

-Ofast -mtune=haswell

are

automotive
=========
  aifftr01 (default) - goodperf: Runtime improvement of   2.6% (time).
  aiifft01 (default) - goodperf: Runtime improvement of   2.2% (time).

networking
=========
  ip_pktcheckb1m (default) - goodperf: Runtime improvement of   3.8% (time).
  ip_pktcheckb2m (default) - goodperf: Runtime improvement of   5.2% (time).
  ip_pktcheckb4m (default) - goodperf: Runtime improvement of   4.4% (time).
  ip_pktcheckb512k (default) - goodperf: Runtime improvement of   4.2% (time).

telecom
=========
  fft00data_1 (default) - goodperf: Runtime improvement of   8.4% (time).
  fft00data_2 (default) - goodperf: Runtime improvement of   8.6% (time).
  fft00data_3 (default) - goodperf: Runtime improvement of   9.0% (time).

	PR target/85829
	* config/i386/x86-tune.def: Re-enable partial_reg_dependency
	and movx for Haswell.
---
 gcc/config/i386/x86-tune.def | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 5649fdcf416..60625668236 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -48,7 +48,7 @@ DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
    over partial stores.  For example preffer MOVZBL or MOVQ to load 8bit
    value over movb.  */
 DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency",
-          m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
+          m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE | m_HASWELL
 	  | m_BONNELL | m_SILVERMONT | m_INTEL
 	  | m_KNL | m_KNM | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC)
 
@@ -84,7 +84,7 @@ DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, "partial_flag_reg_stall",
    partial dependencies.  */
 DEF_TUNE (X86_TUNE_MOVX, "movx",
           m_PPRO | m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
-	  | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL
+	  | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_HASWELL
 	  | m_GEODE | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC)
 
 /* X86_TUNE_MEMORY_MISMATCH_STALL: Avoid partial stores that are followed by
-- 
2.17.0


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]