When benchmarking trunk revision 99ea0d76116 I noticed a 24% regression on Zen4 and Zen3 machines and 16% on a Zen2 and a Intel CascadeLake when running 549.fotonik3d_r from SPEC 2017 FPrate suite built with options -O2 -g -march=x86-64-v3 -flto=32 compared to the binary produced by GCC 12. The number of branches reported by perf stat between gcc 12 and the aforementioned trunk revision on the Zen3 machine jumped by 90%. The symbol profile changed from: Overhead Samples Shared object Name 33.23% 40078 fotonik3d_r_peak.gcc12 __upml_mod_MOD_upml_updatee_simple.lto_priv.0 27.74% 33471 fotonik3d_r_peak.gcc12 __upml_mod_MOD_upml_updateh 17.50% 21114 fotonik3d_r_peak.gcc12 __material_mod_MOD_mat_updatee 9.52% 11493 fotonik3d_r_peak.gcc12 __update_mod_MOD_updateh 9.49% 11445 fotonik3d_r_peak.gcc12 __power_mod_MOD_power_dft To: Overhead Samples Shared object Name 26.68% 39825 fotonik3d_r_peak.trunk __upml_mod_MOD_upml_updatee_simple.lto_priv.0 22.35% 33368 fotonik3d_r_peak.trunk __upml_mod_MOD_upml_updateh 13.99% 20892 fotonik3d_r_peak.trunk __material_mod_MOD_mat_updatee 13.96% 20816 fotonik3d_r_peak.trunk __power_mod_MOD_power_dft 11.51% 17164 libgcc_s.so.1 __muldc3 8.60% 12840 fotonik3d_r_peak.trunk __update_mod_MOD_updateh On the Zen3 machine at least, I have bisected this to: commit 038b077689bb5310386b04d40a2cea234f01e6aa Author: Richard Sandiford <richard.sandiford@arm.com> Date: Wed Jun 22 11:27:15 2022 +0100 data-ref: Improve non-loop disambiguation [PR106019] When dr_may_alias_p is called without a loop context, it tries to use the tree-affine interface to calculate the difference between the two addresses and use that difference to check whether the gap between the accesses is known at compile time. However, as the example in the PR shows, this doesn't expand SSA_NAMEs and so can easily be defeated by things like reassociation. One fix would have been to use aff_combination_expand to expand the SSA_NAMEs, but we'd then need some way of maintaining the associated cache. This patch instead reuses the innermost_loop_behavior fields (which exist even when no loop context is provided). It might still be useful to do the aff_combination_expand thing too, if an example turns out to need it. gcc/ PR tree-optimization/106019 * tree-data-ref.cc (dr_may_alias_p): Try using the innermost_loop_behavior to disambiguate non-loop queries. gcc/testsuite/ PR tree-optimization/106019 * gcc.dg/vect/bb-slp-pr106019.c: New test.