On an Altra Aarch64 (-march=armv8.2-a+crypto+fp16+rcpc+dotprod+ssbs) machine with -O2 -flto generic march between commits g:6cd85273071b5f13 (2023-08-23 00:17) and g:e1f096a3cc96c719 (2023-08-25 22:34) there is a 4% execution time regression. Here is a plot of recent runs: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=581.467.0
Ok, I can reproduce this with the generic cost model on Neoverse N1 hardware. The generic cost model is based on a 10+ years old cpu and is no longer fit for modern CPUs. We are planning to replace it this GCC release so the regression should go away then. I've tested with -mcpu=neoverse-n1 and it does go away and gives a much better score.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:4b6da8e7bdb93d9bca6291157db1c936ac56e7af commit r14-5671-g4b6da8e7bdb93d9bca6291157db1c936ac56e7af Author: Tamar Christina <tamar.christina@arm.com> Date: Tue Nov 21 13:19:36 2023 +0000 AArch64: Refactor costs models to different files. This patch series attempts to move the generic cost model in AArch64 to a new and modern generic standard. The current standard is quite old and generates very suboptimal code out of the box for user of GCC. The goal is for the new cost model to be beneficial on newer/current Arm Microarchitectures while not being too negative for older ones. It does not change any core specific optimization. The final changes reflect both performance optimizations and size optimizations. This first patch just re-organizes the cost structures to their own files. The AArch64.cc file has gotten very big and it's hard to follow. No functional changes are expected from this change. Note that since all the structures have private visibility I've put them in header files instead. gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64.cc (generic_addrcost_table, exynosm1_addrcost_table, xgene1_addrcost_table, thunderx2t99_addrcost_table, thunderx3t110_addrcost_table, tsv110_addrcost_table, qdf24xx_addrcost_table, a64fx_addrcost_table, neoversev1_addrcost_table, neoversen2_addrcost_table, neoversev2_addrcost_table, generic_regmove_cost, cortexa57_regmove_cost, cortexa53_regmove_cost, exynosm1_regmove_cost, thunderx_regmove_cost, xgene1_regmove_cost, qdf24xx_regmove_cost, thunderx2t99_regmove_cost, thunderx3t110_regmove_cost, tsv110_regmove_cost, a64fx_regmove_cost, neoversen2_regmove_cost, neoversev1_regmove_cost, neoversev2_regmove_cost, generic_vector_cost, a64fx_vector_cost, qdf24xx_vector_cost, thunderx_vector_cost, tsv110_vector_cost, cortexa57_vector_cost, exynosm1_vector_cost, xgene1_vector_cost, thunderx2t99_vector_cost, thunderx3t110_vector_cost, ampere1_vector_cost, generic_branch_cost, generic_tunings, cortexa35_tunings, cortexa53_tunings, cortexa57_tunings, cortexa72_tunings, cortexa73_tunings, exynosm1_tunings, thunderxt88_tunings, thunderx_tunings, tsv110_tunings, xgene1_tunings, emag_tunings, qdf24xx_tunings, saphira_tunings, thunderx2t99_tunings, thunderx3t110_tunings, neoversen1_tunings, ampere1_tunings, ampere1a_tunings, neoversev1_vector_cost, neoversev1_tunings, neoverse512tvb_vector_cost, neoverse512tvb_tunings, neoversen2_vector_cost, neoversen2_tunings, neoversev2_vector_cost, neoversev2_tunings a64fx_tunings): Split into own files. * config/aarch64/tuning_models/a64fx.h: New file. * config/aarch64/tuning_models/ampere1.h: New file. * config/aarch64/tuning_models/ampere1a.h: New file. * config/aarch64/tuning_models/cortexa35.h: New file. * config/aarch64/tuning_models/cortexa53.h: New file. * config/aarch64/tuning_models/cortexa57.h: New file. * config/aarch64/tuning_models/cortexa72.h: New file. * config/aarch64/tuning_models/cortexa73.h: New file. * config/aarch64/tuning_models/emag.h: New file. * config/aarch64/tuning_models/exynosm1.h: New file. * config/aarch64/tuning_models/generic.h: New file. * config/aarch64/tuning_models/neoverse512tvb.h: New file. * config/aarch64/tuning_models/neoversen1.h: New file. * config/aarch64/tuning_models/neoversen2.h: New file. * config/aarch64/tuning_models/neoversev1.h: New file. * config/aarch64/tuning_models/neoversev2.h: New file. * config/aarch64/tuning_models/qdf24xx.h: New file. * config/aarch64/tuning_models/saphira.h: New file. * config/aarch64/tuning_models/thunderx.h: New file. * config/aarch64/tuning_models/thunderx2t99.h: New file. * config/aarch64/tuning_models/thunderx3t110.h: New file. * config/aarch64/tuning_models/thunderxt88.h: New file. * config/aarch64/tuning_models/tsv110.h: New file. * config/aarch64/tuning_models/xgene1.h: New file.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:e5678468e550e99944fca6bae364696714ffb445 commit r14-5672-ge5678468e550e99944fca6bae364696714ffb445 Author: Tamar Christina <tamar.christina@arm.com> Date: Tue Nov 21 13:20:10 2023 +0000 AArch64: Remove special handling of generic cpu. In anticipation of adding new generic turning values this removes the hardcoding of the "generic" CPU and instead just specifies it as a normal CPU. No change in behavior is expected. gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64-cores.def: Add generic. * config/aarch64/aarch64-opts.h (enum aarch64_proc): Remove generic. * config/aarch64/aarch64-tune.md: Regenerate * config/aarch64/aarch64.cc (all_cores): Remove generic * config/aarch64/aarch64.h (enum target_cpus): Remove TARGET_CPU_generic.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf commit r14-5673-g33c2b70dbabc02788caabcbc66b7baeafeb95bcf Author: Tamar Christina <tamar.christina@arm.com> Date: Tue Nov 21 13:20:39 2023 +0000 AArch64: Add new generic-armv8-a CPU and make it the default. This patch adds a new generic scheduling model "generic-armv8-a" and makes it the default for all Armv8 architectures. -mcpu=generic and -mtune=generic is kept around for those that really want the previous cost model. This shows on SPECCPU 2017 the following: generic: SPECINT 1.0% improvement in geomean, SPECFP -0.6%. The SPECFP is due to fotonik3d_r where we vectorize an FP calculation that only ever needs one lane of the result. This I believe is a generic costing bug but at the moment we can't change costs of FP and INT independently. So will defer updating that cost to stage3 after Richard's other costing updates land. generic SVE: SPECINT 1.1% improvement in geomean, SPECFP 0.7% improvement. gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64-arches.def (armv8-9, armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a, armv8.5-a, armv8.6-a, armv8.7-a, armv8.8-a): Update to generic_armv8_a. * config/aarch64/aarch64-cores.def (generic-armv8-a): New. * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64.cc: Include generic_armv8_a.h * config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Change to TARGET_CPU_generic_armv8_a. * config/aarch64/tuning_models/generic_armv8_a.h: New file. gcc/testsuite/ChangeLog: PR target/111370 * gcc.target/aarch64/sve/cond_asrd_1.c: Updated. * gcc.target/aarch64/sve/cond_cnot_4.c: Likewise. * gcc.target/aarch64/sve/cond_unary_5.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_5.c: Likewise. * gcc.target/aarch64/target_attr_13.c: Likewise. * gcc.target/aarch64/target_attr_15.c: Likewise.
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:c187fe4bceb90643b88a55a54c4040ab9e40e659 commit r14-5674-gc187fe4bceb90643b88a55a54c4040ab9e40e659 Author: Tamar Christina <tamar.christina@arm.com> Date: Tue Nov 21 13:21:07 2023 +0000 AArch64: Add new generic-armv9-a CPU and make it the default for Armv9 This patch adds a new generic scheduling model "generic-armv9-a" and makes it the default for all Armv9 architectures. -mcpu=generic and -mtune=generic is kept around for those that really want the previous cost model. gcc/ChangeLog: PR target/111370 * config/aarch64/aarch64-arches.def (armv9-a, armv9.1-a, armv9.2-a, armv9.3-a): Update to generic-armv9-a. * config/aarch64/aarch64-cores.def (generic-armv9-a): New. * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64.cc: Include generic_armv9_a.h. * config/aarch64/tuning_models/generic_armv9_a.h: New file.
Fixed.