Created attachment 41125 [details] sef contained scimark2 MC benchmark just got hold of a AMD Ryzen 7 1800X Eight-Core Processor and was surprised by the results running with -march=native the point is that the results can be reproduced on a haswell or broadwell as well! I used full scimark2, just the MC benchmark shows at least one problem this is on intel [innocent@vinavx3 fullMC]$ gcc -march=znver1 -O3 fullMC.c -g ; time ./a.out 1.245u 0.000s 0:01.24 100.0% 0+0k 0+0io 0pf+0w [innocent@vinavx3 fullMC]$ gcc -O3 fullMC.c -g ; time ./a.out 0.327u 0.000s 0:00.32 100.0% 0+0k 0+0io 0pf+0w [innocent@vinavx3 fullMC]$ gcc -march=broadwell -O3 fullMC.c -g ; time ./a.out 0.308u 0.000s 0:00.30 100.0% 0+0k 0+0io 0pf+0w this is on ryzen [innocent@vinzen0 fullMC]$ gcc -march=znver1 -O3 fullMC.c -g ; time ./a.out 1.354u 0.001s 0:01.35 100.0% 0+0k 0+0io 0pf+0w [innocent@vinzen0 fullMC]$ gcc -O3 fullMC.c -g ; time ./a.out 0.315u 0.000s 0:00.31 100.0% 0+0k 0+0io 0pf+0w [innocent@vinzen0 fullMC]$ gcc -march=broadwell -O3 fullMC.c -g ; time ./a.out 0.313u 0.001s 0:00.31 100.0% 0+0k 0+0io 0pf+0w
I think as of now znver1 isn't yet tuned (but mostly a copy of bdverN which appearantly was a worse choice than copying haswell).
markus@x4 ~ % bloaty -d symbols /usr/libexec/gcc/x86_64-pc-linux-gnu/7.0.1/cc1 VM SIZE FILE SIZE -------------- -------------- 63.0% 14.8Mi [Other] 14.4Mi 57.3% 22.6% 5.33Mi [Unmapped] 8.28Mi 32.9% 3.5% 836Ki znver1_fp_transitions 836Ki 3.2% 1.7% 418Ki znver1_fp_min_issue_delay 418Ki 1.6% 1.5% 350Ki operand_data 350Ki 1.4% 1.2% 280Ki insn_data 280Ki 1.1% 1.0% 248Ki default_target_ira_int 0 0.0% 0.9% 224Ki default_target_recog 0 0.0% 0.6% 133Ki znver1_ieu_transitions 133Ki 0.5% 0.5% 121Ki cl_options 121Ki 0.5% 0.4% 105Ki default_target_ira 0 0.0% 0.4% 104Ki default_target_expmed 0 0.0% 0.4% 100Ki ix86_builtins_isa 0 0.0% 0.4% 91.8Ki bdver3_fp_transitions 91.8Ki 0.4% 0.4% 88.4Ki c_common_nodes_and_builtins 88.4Ki 0.3% 0.3% 71.9Ki bdesc_args 71.9Ki 0.3% 0.3% 69.4Ki default_target_reload 0 0.0% 0.3% 67.7Ki hard_reg_clobbers 0 0.0% 0.3% 66.6Ki znver1_ieu_min_issue_delay 66.6Ki 0.3% 0.2% 59.2Ki insn_extract(rtx_insn*) [clone .cold.0] 59.2Ki 0.2% 0.0% 8.16Ki [None] 0 0.0% 100.0% 23.5Mi TOTAL 25.1Mi 100.0%
Thanks for pointing out. It looks like we need to adjust our branch cost. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 14ac189..8212c56 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -1405,7 +1405,7 @@ struct processor_costs znver1_cost = { to limit number of prefetches at all, as their execution also takes some time). */ 100, /* number of parallel prefetches. */ - 2, /* Branch cost. */ + 3, /* Branch cost. */ COSTS_N_INSNS (6), /* cost of FADD and FSUB insns. */ COSTS_N_INSNS (6), /* cost of FMUL instruction. */ COSTS_N_INSNS (42), /* cost of FDIV instruction. */ @@ -4155,6 +4155,312 @@ convert_scalars_to_vector () return 0; } Adjusting it the runtime for attached test case improved. As richard said we have not yet completed the cost model tuning for Zen. We will benchmark and come back on this.
I have benchmarked the update of branch cost from 2 to 3 and commited it today. So this should be fixed on mainline.
Fixed on trunk.