400.perlbench is slower when compiled at -O2 (and generic march/mtune) with both PGO and LTO when compiled with master (26b3e568a60) than when built with GCC 9, on Zen2 by 13% and on Zen1 by 7%. The performance is comparable on Intel Cascade Lake server CPU. I attempted bisecting the problems on the Zen2 CPU but was only partially successful because a lot of the slowdown seemed to have happened gradually. The first bigger slowdown - almost 4% - came with: 562d1e9556777988ae46c5d1357af2636bc272ea is the first bad commit commit 562d1e9556777988ae46c5d1357af2636bc272ea Author: Jan Hubicka <hubicka@gcc.gnu.org> Date: Wed Oct 2 16:01:47 2019 +0000 cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT, [...]): New. * cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT, MAX_INLINE_INSNS_AUTO_O2_LIMIT): New. ... From-SVN: r276469 About the same performance loss was then introduced by: commit 2925cad2151842daa387950e62d989090e47c91d Author: Jan Hubicka <hubicka@ucw.cz> Date: Thu Oct 3 17:08:21 2019 +0200 params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT, [...]): New. * params.def (PARAM_INLINE_HEURISTICS_HINT_PERCENT, PARAM_INLINE_HEURISTICS_HINT_PERCENT_O2): New. * doc/invoke.texi (inline-heuristics-hint-percent, inline-heuristics-hint-percent-O2): Document. * tree-inline.c (inline_insns_single, inline_insns_auto): Add new hint attribute. (can_inline_edge_by_limits_p): Use it. And finally throughout March the benchmark is quite jumpy but finally ended again ended up about 5% slower than at the beginning of the month.
This looks like an important issue to me. maybe P2 ?
Martin, can you try to change the limits, maybe that is just a limit for inline expansions that is not right?
My benchmarking setup is currently gone so unfortunately no, not easily. I'll be re-measuring everything on a different computer with a slightly different CPU model soon, so after that I guess I could. But it is most likely the limits, yes.
(In reply to Martin Jambor from comment #3) > My benchmarking setup is currently gone so unfortunately no, not easily. > I'll be re-measuring everything on a different computer with a slightly > different CPU model soon, so after that I guess I could. But it is most > likely the limits, yes. Yeah, easy to fix, but it takes some time. But this is not more important than your life. Shall I raise this to P1 so it prevents gcc-10 release?
No, we can't block GCC 10 release indefinitely, we are already behind the usual schedule. We need to resolve the C++ ABI issues and get the release out.
(In reply to Jakub Jelinek from comment #5) > No, we can't block GCC 10 release indefinitely, we are already behind the > usual schedule. We need to resolve the C++ ABI issues and get the release > out. Sorry, have you heard of the Corona pandemic out there? This is not like olympic games 2020, which has been cancelled? I just say I would delay gcc 10 right now, before it is too late, this performance regression will make the damage worse.
(In reply to Bernd Edlinger from comment #4) > (In reply to Martin Jambor from comment #3) > > My benchmarking setup is currently gone so unfortunately no, not easily. > > I'll be re-measuring everything on a different computer with a slightly > > different CPU model soon, so after that I guess I could. But it is most > > likely the limits, yes. > > Yeah, easy to fix, but it takes some time. > But this is not more important than your life. Note tuning parameters is hard and takes a lot of time. If we adjust things to make 400.perlbench happy which is btw. from SPEC 2006(!) we're going to regress things elsewhere. It's going to be a whack-a-mole game and definitely not suitable at this stage (inliner re-tuning is also prone to trigger latent GCC issues in previously fine compiling apps). > Shall I raise this to P1 so it prevents gcc-10 release? Definitely not. Setting priority is the release managers job, and btw. bug priority is meaningless for non-regression bugreports.
Oh, and bugfixing requires to first understand the bug. Especially for performance related issues understanding what goes wrong is important. I see no analysis being performed to date.
> --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- > Oh, and bugfixing requires to first understand the bug. Especially for > performance related issues understanding what goes wrong is important. > I see no analysis being performed to date. The problem here is that -O2 -fprofile-use is now using -O2 inliner limits while previously it used -O3 inliner limit (because -fprofile-use enables -finline-functions). I can see this on SPEC GCC, perl, Firefox, real GCC and clang. We now have performance diference between -O2+FDO and -O3+FDO. It is something I kind of missed in my testing, because I was testing -O2 and -O3 + FDO but not -O2+FDO. I realize that -O2+FDO is kind of important because we use it in our bootstrap. So i was collecting data over weekend for Clang, GCC and Firefox. It is question how agressive we want to be at -O2+FDO but the observation is that in all these programs the code size growth for -O3 style limits is quite small (bellow 2%) simply because thraining coverage is quite small in all those programs (sub 10%) and thus the code size growth for inlining hot calls is acceptable and thus I think the current defaults are really suboptimal. I think there are few ways to proceed 1) make inline limits with FDO to be -O3 ones 2) invent yet another set of parameters for FDO 3) increase importance of known_hot hint that is set of calls that are known to be hot (either by inlining or by hot attribute). 1 is easiest but bit non-sytematic. I am not really keen about 2 because if parameter explosion. However 3 looks like good alternative so I am running benchmarks with few settings of it, but they take some time. Honza
(In reply to Richard Biener from comment #7) > > > Shall I raise this to P1 so it prevents gcc-10 release? > > Definitely not. Setting priority is the release managers job, and btw. > bug priority is meaningless for non-regression bugreports. Okay, Richard, is this P2 or P3 then, I just wanted you to think about it. ;-) Thanks Bernd.