Bug 108629 - 549.fotonik3d_r regresses 15-24% at -O2 -flto -march=x86-64-v3 since r13-1203-g038b077689bb53
Summary: 549.fotonik3d_r regresses 15-24% at -O2 -flto -march=x86-64-v3 since r13-1203...
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: spec
  Show dependency treegraph
 
Reported: 2023-02-01 13:01 UTC by Martin Jambor
Modified: 2023-02-02 12:12 UTC (History)
4 users (show)

See Also:
Host: x86_64-linux
Target: x86_64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Jambor 2023-02-01 13:01:52 UTC
When benchmarking trunk revision 99ea0d76116 I noticed a 24%
regression on Zen4 and Zen3 machines and 16% on a Zen2 and a Intel
CascadeLake when running 549.fotonik3d_r from SPEC 2017 FPrate suite
built with options -O2 -g -march=x86-64-v3 -flto=32 compared to the
binary produced by GCC 12.

The number of branches reported by perf stat between gcc 12 and the
aforementioned trunk revision on the Zen3 machine jumped by 90%.

The symbol profile changed from:

  Overhead  Samples  Shared object           Name
  33.23%    40078    fotonik3d_r_peak.gcc12  __upml_mod_MOD_upml_updatee_simple.lto_priv.0
  27.74%    33471    fotonik3d_r_peak.gcc12  __upml_mod_MOD_upml_updateh
  17.50%    21114    fotonik3d_r_peak.gcc12  __material_mod_MOD_mat_updatee
  9.52%     11493    fotonik3d_r_peak.gcc12  __update_mod_MOD_updateh
  9.49%     11445    fotonik3d_r_peak.gcc12  __power_mod_MOD_power_dft

To:

  Overhead  Samples  Shared object           Name
  26.68%    39825    fotonik3d_r_peak.trunk  __upml_mod_MOD_upml_updatee_simple.lto_priv.0
  22.35%    33368    fotonik3d_r_peak.trunk  __upml_mod_MOD_upml_updateh
  13.99%    20892    fotonik3d_r_peak.trunk  __material_mod_MOD_mat_updatee
  13.96%    20816    fotonik3d_r_peak.trunk  __power_mod_MOD_power_dft
  11.51%    17164    libgcc_s.so.1           __muldc3
  8.60%     12840    fotonik3d_r_peak.trunk  __update_mod_MOD_updateh


On the Zen3 machine at least, I have bisected this to:

  commit 038b077689bb5310386b04d40a2cea234f01e6aa
  Author: Richard Sandiford <richard.sandiford@arm.com>
  Date:   Wed Jun 22 11:27:15 2022 +0100

    data-ref: Improve non-loop disambiguation [PR106019]

    When dr_may_alias_p is called without a loop context, it tries
    to use the tree-affine interface to calculate the difference
    between the two addresses and use that difference to check whether
    the gap between the accesses is known at compile time.  However, as the
    example in the PR shows, this doesn't expand SSA_NAMEs and so can easily
    be defeated by things like reassociation.

    One fix would have been to use aff_combination_expand to expand the
    SSA_NAMEs, but we'd then need some way of maintaining the associated
    cache.  This patch instead reuses the innermost_loop_behavior fields
    (which exist even when no loop context is provided).

    It might still be useful to do the aff_combination_expand thing too,
    if an example turns out to need it.

    gcc/
            PR tree-optimization/106019
            * tree-data-ref.cc (dr_may_alias_p): Try using the
            innermost_loop_behavior to disambiguate non-loop queries.

    gcc/testsuite/
            PR tree-optimization/106019
            * gcc.dg/vect/bb-slp-pr106019.c: New test.