During the development of GCC 13, 510.parest_r run-time regressed on x86_64 when built with profile guided optimization and just plain O2 and master than when using GCC12. The difference is not big but fairly clear cut, about 7.6% on Zen3: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=740.457.0&plot.1=892.457.0&plot.2=694.457.0& and about 7.2% on Zen2: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=777.457.0&plot.1=932.457.0&plot.2=687.457.0& The graphs above show use of both LTO and PGO but LTO is not necessary. I was able to bisect the regression to commit r13-4272-g8caf155a3d6e23 (i386: Only enable small loop unrolling in backend [PR 107692]). parest_r is also about 4% slower when compiled with this revision than with the previous one on Intel CascadeLake.
GCC 13.3 is being released, retargeting bugs to GCC 13.4.