This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Wed, 07 Feb 2018 15:46:38 +0000
- Subject: [Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644
- Auto-submitted: auto-generated
- References: <bug-84037-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
So after r257453 we improve the situation pre-IVOPTs to just
6 IVs (duplicated but trivially equivalent) plus one counting IV. But then
when SLP is enabled IVOPTs comes along and adds another 4 IVs which makes us
spill... (for AVX256, so you need -march=core-avx2 for example).
Bin, any chance you can take a look? In the IVO dump I see
target_avail_regs 15
target_clobbered_regs 9
target_reg_cost 4
target_spill_cost 8
regs_used 3
^^^
and regs_used looks awfully low to me. The loop has even more IVs initially
plus variable steps for that IVs which means we need two regs per IV.
There doesn't seem to be a way to force IVOPTs to use the minimal set of IVs?
Or just use the original set, removing the obvious redundancies? There is
a microarchitectural issue left with the vectorization but the spilling
obscures the look quite a bit :/