Bug 111370 - On Aarch64 4% 511.povray_r regression between g:6cd85273071b5f13 (2023-08-23 00:17) and g:e1f096a3cc96c719 (2023-08-25 22:34)
Summary: On Aarch64 4% 511.povray_r regression between g:6cd85273071b5f13 (2023-08-23 ...
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: needs-bisection
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2023-09-11 13:51 UTC by Filip Kastl
Modified: 2023-11-21 13:29 UTC (History)
3 users (show)

See Also:
Host: aarch64-gnu-linux
Target: aarch64-gnu-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2023-09-12 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Filip Kastl 2023-09-11 13:51:36 UTC
On an Altra Aarch64 (-march=armv8.2-a+crypto+fp16+rcpc+dotprod+ssbs) machine with -O2 -flto generic march between commits g:6cd85273071b5f13 (2023-08-23 00:17) and g:e1f096a3cc96c719 (2023-08-25 22:34) there is a 4% execution time regression.

Here is a plot of recent runs:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=581.467.0
Comment 1 Tamar Christina 2023-09-12 12:31:33 UTC
Ok, I can reproduce this with the generic cost model on Neoverse N1 hardware.

The generic cost model is based on a 10+ years old cpu and is no longer fit for modern CPUs.

We are planning to replace it this GCC release so the regression should go away then.

I've tested with -mcpu=neoverse-n1 and it does go away and gives a much better score.
Comment 2 GCC Commits 2023-11-21 13:25:41 UTC
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:4b6da8e7bdb93d9bca6291157db1c936ac56e7af

commit r14-5671-g4b6da8e7bdb93d9bca6291157db1c936ac56e7af
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Tue Nov 21 13:19:36 2023 +0000

    AArch64: Refactor costs models to different files.
    
    This patch series attempts to move the generic cost model in AArch64 to a new
    and modern generic standard.  The current standard is quite old and generates
    very suboptimal code out of the box for user of GCC.
    
    The goal is for the new cost model to be beneficial on newer/current Arm
    Microarchitectures while not being too negative for older ones.
    
    It does not change any core specific optimization.  The final changes reflect
    both performance optimizations and size optimizations.
    
    This first patch just re-organizes the cost structures to their own files.
    The AArch64.cc file has gotten very big and it's hard to follow.
    
    No functional changes are expected from this change.  Note that since all the
    structures have private visibility I've put them in header files instead.
    
    gcc/ChangeLog:
    
            PR target/111370
            * config/aarch64/aarch64.cc (generic_addrcost_table,
            exynosm1_addrcost_table,
            xgene1_addrcost_table,
            thunderx2t99_addrcost_table,
            thunderx3t110_addrcost_table,
            tsv110_addrcost_table,
            qdf24xx_addrcost_table,
            a64fx_addrcost_table,
            neoversev1_addrcost_table,
            neoversen2_addrcost_table,
            neoversev2_addrcost_table,
            generic_regmove_cost,
            cortexa57_regmove_cost,
            cortexa53_regmove_cost,
            exynosm1_regmove_cost,
            thunderx_regmove_cost,
            xgene1_regmove_cost,
            qdf24xx_regmove_cost,
            thunderx2t99_regmove_cost,
            thunderx3t110_regmove_cost,
            tsv110_regmove_cost,
            a64fx_regmove_cost,
            neoversen2_regmove_cost,
            neoversev1_regmove_cost,
            neoversev2_regmove_cost,
            generic_vector_cost,
            a64fx_vector_cost,
            qdf24xx_vector_cost,
            thunderx_vector_cost,
            tsv110_vector_cost,
            cortexa57_vector_cost,
            exynosm1_vector_cost,
            xgene1_vector_cost,
            thunderx2t99_vector_cost,
            thunderx3t110_vector_cost,
            ampere1_vector_cost,
            generic_branch_cost,
            generic_tunings,
            cortexa35_tunings,
            cortexa53_tunings,
            cortexa57_tunings,
            cortexa72_tunings,
            cortexa73_tunings,
            exynosm1_tunings,
            thunderxt88_tunings,
            thunderx_tunings,
            tsv110_tunings,
            xgene1_tunings,
            emag_tunings,
            qdf24xx_tunings,
            saphira_tunings,
            thunderx2t99_tunings,
            thunderx3t110_tunings,
            neoversen1_tunings,
            ampere1_tunings,
            ampere1a_tunings,
            neoversev1_vector_cost,
            neoversev1_tunings,
            neoverse512tvb_vector_cost,
            neoverse512tvb_tunings,
            neoversen2_vector_cost,
            neoversen2_tunings,
            neoversev2_vector_cost,
            neoversev2_tunings
            a64fx_tunings): Split into own files.
            * config/aarch64/tuning_models/a64fx.h: New file.
            * config/aarch64/tuning_models/ampere1.h: New file.
            * config/aarch64/tuning_models/ampere1a.h: New file.
            * config/aarch64/tuning_models/cortexa35.h: New file.
            * config/aarch64/tuning_models/cortexa53.h: New file.
            * config/aarch64/tuning_models/cortexa57.h: New file.
            * config/aarch64/tuning_models/cortexa72.h: New file.
            * config/aarch64/tuning_models/cortexa73.h: New file.
            * config/aarch64/tuning_models/emag.h: New file.
            * config/aarch64/tuning_models/exynosm1.h: New file.
            * config/aarch64/tuning_models/generic.h: New file.
            * config/aarch64/tuning_models/neoverse512tvb.h: New file.
            * config/aarch64/tuning_models/neoversen1.h: New file.
            * config/aarch64/tuning_models/neoversen2.h: New file.
            * config/aarch64/tuning_models/neoversev1.h: New file.
            * config/aarch64/tuning_models/neoversev2.h: New file.
            * config/aarch64/tuning_models/qdf24xx.h: New file.
            * config/aarch64/tuning_models/saphira.h: New file.
            * config/aarch64/tuning_models/thunderx.h: New file.
            * config/aarch64/tuning_models/thunderx2t99.h: New file.
            * config/aarch64/tuning_models/thunderx3t110.h: New file.
            * config/aarch64/tuning_models/thunderxt88.h: New file.
            * config/aarch64/tuning_models/tsv110.h: New file.
            * config/aarch64/tuning_models/xgene1.h: New file.
Comment 3 GCC Commits 2023-11-21 13:25:45 UTC
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:e5678468e550e99944fca6bae364696714ffb445

commit r14-5672-ge5678468e550e99944fca6bae364696714ffb445
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Tue Nov 21 13:20:10 2023 +0000

    AArch64: Remove special handling of generic cpu.
    
    In anticipation of adding new generic turning values this removes the hardcoding
    of the "generic" CPU and instead just specifies it as a normal CPU.
    
    No change in behavior is expected.
    
    gcc/ChangeLog:
    
            PR target/111370
            * config/aarch64/aarch64-cores.def: Add generic.
            * config/aarch64/aarch64-opts.h (enum aarch64_proc): Remove generic.
            * config/aarch64/aarch64-tune.md: Regenerate
            * config/aarch64/aarch64.cc (all_cores): Remove generic
            * config/aarch64/aarch64.h (enum target_cpus): Remove
            TARGET_CPU_generic.
Comment 4 GCC Commits 2023-11-21 13:25:51 UTC
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf

commit r14-5673-g33c2b70dbabc02788caabcbc66b7baeafeb95bcf
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Tue Nov 21 13:20:39 2023 +0000

    AArch64: Add new generic-armv8-a CPU and make it the default.
    
    This patch adds a new generic scheduling model "generic-armv8-a" and makes it
    the default for all Armv8 architectures.
    
    -mcpu=generic and -mtune=generic is kept around for those that really want the
    previous cost model.
    
    This shows on SPECCPU 2017 the following:
    
    generic:  SPECINT 1.0% improvement in geomean, SPECFP -0.6%.  The SPECFP is due
              to fotonik3d_r where we vectorize an FP calculation that only ever
              needs one lane of the result.  This I believe is a generic costing bug
              but at the moment we can't change costs of FP and INT independently.
              So will defer updating that cost to stage3 after Richard's other
              costing updates land.
    
    generic SVE: SPECINT 1.1% improvement in geomean, SPECFP 0.7% improvement.
    
    gcc/ChangeLog:
    
            PR target/111370
            * config/aarch64/aarch64-arches.def (armv8-9, armv8-a, armv8.1-a,
            armv8.2-a, armv8.3-a, armv8.4-a, armv8.5-a, armv8.6-a, armv8.7-a,
            armv8.8-a): Update to generic_armv8_a.
            * config/aarch64/aarch64-cores.def (generic-armv8-a): New.
            * config/aarch64/aarch64-tune.md: Regenerate.
            * config/aarch64/aarch64.cc: Include generic_armv8_a.h
            * config/aarch64/aarch64.h (TARGET_CPU_DEFAULT): Change to
            TARGET_CPU_generic_armv8_a.
            * config/aarch64/tuning_models/generic_armv8_a.h: New file.
    
    gcc/testsuite/ChangeLog:
    
            PR target/111370
            * gcc.target/aarch64/sve/cond_asrd_1.c: Updated.
            * gcc.target/aarch64/sve/cond_cnot_4.c: Likewise.
            * gcc.target/aarch64/sve/cond_unary_5.c: Likewise.
            * gcc.target/aarch64/sve/cond_uxt_5.c: Likewise.
            * gcc.target/aarch64/target_attr_13.c: Likewise.
            * gcc.target/aarch64/target_attr_15.c: Likewise.
Comment 5 GCC Commits 2023-11-21 13:25:56 UTC
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:c187fe4bceb90643b88a55a54c4040ab9e40e659

commit r14-5674-gc187fe4bceb90643b88a55a54c4040ab9e40e659
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Tue Nov 21 13:21:07 2023 +0000

    AArch64: Add new generic-armv9-a CPU and make it the default for Armv9
    
    This patch adds a new generic scheduling model "generic-armv9-a" and makes it
    the default for all Armv9 architectures.
    
    -mcpu=generic and -mtune=generic is kept around for those that really want the
    previous cost model.
    
    gcc/ChangeLog:
    
            PR target/111370
            * config/aarch64/aarch64-arches.def (armv9-a, armv9.1-a, armv9.2-a,
            armv9.3-a): Update to generic-armv9-a.
            * config/aarch64/aarch64-cores.def (generic-armv9-a): New.
            * config/aarch64/aarch64-tune.md: Regenerate.
            * config/aarch64/aarch64.cc: Include generic_armv9_a.h.
            * config/aarch64/tuning_models/generic_armv9_a.h: New file.
Comment 6 Tamar Christina 2023-11-21 13:29:29 UTC
Fixed.