[Patch] [x86_64] znver1 enablement

Uros Bizjak ubizjak@gmail.com
Wed Sep 30 12:43:00 GMT 2015


On Wed, Sep 30, 2015 at 12:05 PM, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi Maintainers,
>
> The attached patch enables -march=znver1 (AMD family 17h Zen processor).
>
> Costs and tunings are copied from bdver4,  but we will be adjusting them later for znver1.
> Also a basic scheduler description for znver1 is added and we will update this as we get more information.
>
> Testing :
> GCC bootstrap and gcc regression passes on x86_64-pc-linux-gnu.
> GCC bootstrap passed with  "make BOOT_CFLAGS= -O2 -g -march=znver1 -mno-adx -mno-mwaitx -mno-clzero -mno-sha -mno-clflushopt -mno-rdseed" on x86_64-pc-linux-gnu .
>
> Built SPEC2006 benchmarks with -march=znver1 and ran it on bdver4 machine.
>
> Wrf and Calculix failed to compile but looks like a general register allocation issue not restricted to -march=znver1.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67717
>
> ChangeLog:
>                 * config.gcc (i[34567]86-*-linux* | ...): Add znver1.
>                 (case ${target}): Add znver1.
>                 * config/i386/cpuid.h(bit_CLZERO):  Define.
>                 * config/i386/driver-i386.c: (host_detect_local_cpu): Let
>                 -march=native recognize znver1 processors.
>                 * config/i386/i386-c.c (ix86_target_macros_internal): Add
>                 znver1, clzero def_and_undef.
>                 * config/i386/i386.c (struct processor_costs znver1_cost): New.
>                 (m_znver1): New definition.
>                 (m_AMD_MULTIPLE): Includes m_znver1.
>                 (processor_target_table): Add znver1 entry.
>                 (ix86_target_string) : Add clzero entry.
>                 (static const char *const cpu_names): Add znver1 entry.
>                 (ix86_option_override_internal): Add znver1 instruction sets.
>                 (PTA_CLZERO) :  New definition.
>                 (ix86_option_override_internal): Handle new clzerooption.
>                 (ix86_issue_rate): Add znver1.
>                 (ix86_adjust_cost): Add znver1.
>                 (get_builtin_code_for_version): Set priority for PROCESSOR_ZNVER1.
>                 (ia32_multipass_dfa_lookahead): Add znver1.
>                 (enum processor_model): Add M_AMDFAM17H_znver1.
>                 (struct _arch_names_table): Add M_AMDFAM17H_znver1.
>                 (has_dispatch): Add znver1.
>                 * config/i386/i386.h (TARGET_znver1): New definition.
>                 (TARGET_CLZERO): Define.
>                 (TARGET_CLZERO_P): Define.
>                 (struct ix86_size_cost): Add TARGET_ZNVER1.
>                 (enum processor_type): Add PROCESSOR_znver1.
>                 * config/i386/i386.md (define_attr "cpu"): Add znver1.
>                 (set_attr znver1_decode): New definitions for znver1.
>                 * config/i386/i386.opt (flag_dispatch_scheduler): Add znver1.
>                 (mclzero): New.
>                 * config/i386/mmx.md (set_attr znver1_decode): New definitions
>                 for znver1.
>                 * config/i386/sse.md (set_attr znver1_decode): Likewise.
>                 * config/i386/x86-tune.def:  Add znver1 tunings.
>                 * config/i386/znver1.md: Introduce znver1 cpu and include new md file.
>                 * gcc/doc/extend.texi: Add details about znver1.
>                 * gcc/doc/invoke.texi: Add details about znver1.
>
> Ok for trunk?

Please remove changes to get_builtin_code_for_version and
fold_builtin_cpu. They should be committed in a future patch, together
with relevant libgcc changes to libgcc/config/i386/cpuinfo.c.

+++ b/gcc/config.gcc
@@ -592,7 +592,7 @@ pentium4 pentium4m pentiumpro prescott iamcu"
 # 64-bit x86 processors supported by --with-arch=.  Each processor
 # MUST be separated by exactly one space.
 x86_64_archs="amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \
-bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \
+bdver3 bdver4 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona znver1 \
 core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \

Please group the new processor with other AMD processors.

+++ b/gcc/config/i386/i386-c.c

     | PTA_MOVBE | PTA_MWAITX},
-      {"btver1", PROCESSOR_BTVER1, CPU_GENERIC,
+     {"znver1", PROCESSOR_ZNVER1, CPU_ZNVER1,

wrong indetation.

+++ b/gcc/config/i386/driver-i386.c
@@ -414,6 +414,7 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
   unsigned int has_avx512dq = 0, has_avx512bw = 0, has_avx512vl = 0;
   unsigned int has_avx512vbmi = 0, has_avx512ifma = 0, has_clwb = 0;
   unsigned int has_pcommit = 0, has_mwaitx = 0;
+  unsigned int has_clzero = 0;

This option should be added to the options string. Please see the end
of driver-i386.c on how other options are precessed.

+++ b/gcc/config/i386/i386.opt
@@ -569,8 +569,8 @@ the function.

 mdispatch-scheduler
 Target RejectNegative Var(flag_dispatch_scheduler)
-Do dispatch scheduling if processor is bdver1 or bdver2 or bdver3 or
bdver4 and Haifa scheduling
-is selected.
+Do dispatch scheduling if processor is bdver1 or bdver2 or bdver3 or bdver4
+or znver1 and Haifa scheduling is selected.

Please reword the above with commas: ... if processor is bdver1,
bdver2, bdver3, bdver4 or znver1 ...

+++ b/gcc/config/i386/x86-tune.def
@@ -59,7 +59,7 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY,
"partial_reg_dependency",
    that can be partly masked by careful scheduling of moves.  */
 DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency",
           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
-      | m_BDVER | m_GENERIC)
+      | m_BDVER | m_GENERIC | m_ZNVER1)

[...]

Please leave m_GENERIC as the last entry. Also, please group AMD masks
together in some consistent way throughout the x86-tune.def file.

+++ b/gcc/config/i386/znver1.md

+;; SSE converts
+(define_insn_reservation "znver1_ssecvtsf_si_load" 9

"SSE conversions" ?

+++ b/gcc/doc/extend.texi
@@ -16924,6 +16924,9 @@ AMD Family 15h Bulldozer version 3.
 @item bdver4
 AMD Family 15h Bulldozer version 4.

+@item znver1
+AMD Family 17h Zen version 1.
+
 @item btver2
 AMD Family 16h CPU.
 @end table

Please remove the not yet implemented change above.

Otherwise, the patch looks OK. Please repost it with the above changes
for the final approval.

Uros.



More information about the Gcc-patches mailing list