Around 22nd of February 2022, SPEC 2017 538.imagick_r regressed on all x86_64 systems used by our periodic benchmarker. I have bisected the zen3 -Ofast -march=native -flto case to revision r12-7319-g90d693bdc9d718 but I think most if not all of the regressions are caused by this: commit 90d693bdc9d71841f51d68826ffa5bd685d7f0bc Author: Richard Biener <rguenther@suse.de> Date: Fri Feb 18 14:32:14 2022 +0100 target/99881 - x86 vector cost of CTOR from integer regs This uses the now passed SLP node to the vectorizer costing hook to adjust vector construction costs for the cost of moving an integer component from a GPR to a vector register when that's required for building a vector from components. A cruical difference here is whether the component is loaded from memory or extracted from a vector register as in those cases no intermediate GPR is involved. The pr99881.c testcase can be Un-XFAILed with this patch, the pr91446.c testcase now produces scalar code which looks superior to me so I've adjusted it as well. List of (selected) regressions with links to LNT graphs: zen2 -O2 regressed by 26%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=297.507.0 zen2 -O2 -flto regressed by 26% too: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=296.507.0 zen2 -Ofast -march=native by 28%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.507.0 zen2 -Ofast -march=native -flto by 23: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.507.0 zen3 -O2 by 8%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=470.507.0 zen3 -O2 -march=native by 18%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=472.507.0 zen3 -Ofast -march=native by 17%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.507.0 zen3 -Ofast -march=native -flto by 10%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.507.0 kabylake -O2 by 9%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=226.507.0 (though this one looks suspiciously noisy) kabylake -O2 -march=native by 16%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=28.507.0 kabylake -Ofast -march=native by 22%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=32.507.0 kabylake -Ofast -march=native -flto by 15%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=11.507.0
The same revision also regressed SPEC 2017 INTrate benchmark x264_r by about 10% at -Ofast -march=native with and without LTO on zen3: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.377.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=475.377.0 and I suspect (unlike the above, I have not actually verified it) also on zen2: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=295.377.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=287.377.0
For 525, it's PR101929, for 538, it's STF issue.
See PR101929 comment#7 for two possible things we can do at this stage that might "fix" the regression.
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:69619acd8d9b5856f5af6e5323d9c7c4ec9ad08f commit r12-7612-g69619acd8d9b5856f5af6e5323d9c7c4ec9ad08f Author: Richard Biener <rguenther@suse.de> Date: Fri Mar 11 11:51:13 2022 +0100 target/104762 - vectorization costs of CONSTRUCTORs After accounting for GPR -> XMM move cost for vec_construct the base cost needs adjustments to not double-cost those. This also lowers the cost when such move is not necessary. 2022-03-11 Richard Biener <rguenther@suse.de> PR target/104762 * config/i386/i386.cc (ix86_builtin_vectorization_cost): Do not cost the first lane of SSE pieces as inserts for vec_construct.
Fixed.