The revision r13-3875-g9e11ceef165bc0 was supposed to speed up the benchmark, but it makes it slower w/ -O2 -flto and PGO: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=694.437.0 Similar regression can be seen w/o LTO as well: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=463.437.0
Nope, it wasn't supposed to speedup the benchmark but it indeed (with -Ofast) causes the hot loop kernels to be unswitched. Btw, do we know if train and ref data align up in these loops? Btw, with -Ofast on znver2 I didn't observe any change when benchmarking this. I'm trying to reproduce. OK, so with -O2 -flto -march=znver2 and FDO I get a runtime of 173s while adding -fno-unswitch-loops gets me 188s. There's currently no knob to specifically disable outer loop unswitching so I have to instead patch that up. With -O2 -flto -funswitch-loops (w/o FDO) I get 178s. I'm going to add a --param to allow easier reproduction.
So with --param max-unswitch-depth=1 and -O2 -flto -march=znver2 + FDO I get 176s which is slower than with unswitching outer loops. Means I cannot reproduce (at least with this specific feature, aka this revision).
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:5b50850c3c6f2eceb8012dcc8d3cd5ddd94fac6c commit r13-4458-g5b50850c3c6f2eceb8012dcc8d3cd5ddd94fac6c Author: Richard Biener <rguenther@suse.de> Date: Thu Dec 1 16:14:14 2022 +0100 Add --param max-unswitch-depth The following adds a --param to limit the depth of unswitched loop nests. One can use --param max-unswitch-depth=1 to disable unswitching of outer loops (the innermost loop will then be unswitched). PR tree-optimization/107946 * params.opt (-param=max-unswitch-depth=): New. * doc/invoke.texi (--param=max-unswitch-depth): Document. * tree-ssa-loop-unswitch.cc (init_loop_unswitch_info): Honor --param=max-unswitch-depth
The data found for other machines/flags is also rather inconclusive with ups and downs.
GCC 13.1 is being released, retargeting bugs to GCC 13.2.
GCC 13.2 is being released, retargeting bugs to GCC 13.3.
This regression is still there (as the graphs linked in the summary show).
GCC 13.3 is being released, retargeting bugs to GCC 13.4.