PING^1: [PATCH GCC 8] x86: Re-enable partial_reg_dependency and movx for Haswell
H.J. Lu
hjl.tools@gmail.com
Wed May 30 12:44:00 GMT 2018
On Sun, May 20, 2018 at 11:51 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> r254152 disabled partial_reg_dependency and movx for Haswell and newer
>> Intel processors. r258972 restored them for skylake-avx512. For Haswell,
>> movx improves performance. But partial_reg_stall may be better than
>> partial_reg_dependency in theory. We will investigate performance impact
>> of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9. In
>> the meantime, this patch restores both partial_reg_dependency and mox for
>> Haswell in GCC 8.
>>
>> OK for GCC 8?
>
> I would still like to know in what situations/bechnarks it improves the performance.
> The change was benchmarked on spec2000/2006 plus some additional benchmarks and, so
> it would be nice to know where it hurts.
From
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85829#c5:
I have made measurements on HSW comparing
-mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell to
-Ofast -mtune=haswell and I see improvements on EEMBC benchmarks.
automotive
=========
aifftr01 (default) - goodperf: Runtime improvement of 2.6% (time).
aiifft01 (default) - goodperf: Runtime improvement of 2.2% (time).
networking
=========
ip_pktcheckb1m (default) - goodperf: Runtime improvement of 3.8% (time).
ip_pktcheckb2m (default) - goodperf: Runtime improvement of 5.2% (time).
ip_pktcheckb4m (default) - goodperf: Runtime improvement of 4.4% (time).
ip_pktcheckb512k (default) - goodperf: Runtime improvement of 4.2% (time).
telecom
=========
fft00data_1 (default) - goodperf: Runtime improvement of 8.4% (time).
fft00data_2 (default) - goodperf: Runtime improvement of 8.6% (time).
fft00data_3 (default) - goodperf: Runtime improvement of 9.0% (time).
OK for GCC 8?
H.J.
> Honza
>>
>> H.J.
>> ---
>> PR target/85829
>> * config/i386/x86-tune.def: Re-enable partial_reg_dependency
>> and movx for Haswell.
>> ---
>> gcc/config/i386/x86-tune.def | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
>> index 5649fdcf416..60625668236 100644
>> --- a/gcc/config/i386/x86-tune.def
>> +++ b/gcc/config/i386/x86-tune.def
>> @@ -48,7 +48,7 @@ DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
>> over partial stores. For example preffer MOVZBL or MOVQ to load 8bit
>> value over movb. */
>> DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency",
>> - m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE
>> + m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE | m_HASWELL
>> | m_BONNELL | m_SILVERMONT | m_INTEL
>> | m_KNL | m_KNM | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC)
>>
>> @@ -84,7 +84,7 @@ DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, "partial_flag_reg_stall",
>> partial dependencies. */
>> DEF_TUNE (X86_TUNE_MOVX, "movx",
>> m_PPRO | m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE
>> - | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL
>> + | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_HASWELL
>> | m_GEODE | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC)
>>
>> /* X86_TUNE_MEMORY_MISMATCH_STALL: Avoid partial stores that are followed by
>> --
>> 2.17.0
>>
--
H.J.
More information about the Gcc-patches
mailing list