This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
So after r257453 we improve the situation pre-IVOPTs to just
6 IVs (duplicated but trivially equivalent) plus one counting IV.  But then
when SLP is enabled IVOPTs comes along and adds another 4 IVs which makes us
spill... (for AVX256, so you need -march=core-avx2 for example).

Bin, any chance you can take a look?  In the IVO dump I see

  target_avail_regs 15
  target_clobbered_regs 9
  target_reg_cost 4
  target_spill_cost 8
  regs_used 3
^^^

and regs_used looks awfully low to me.  The loop has even more IVs initially
plus variable steps for that IVs which means we need two regs per IV.

There doesn't seem to be a way to force IVOPTs to use the minimal set of IVs?
Or just use the original set, removing the obvious redundancies?  There is
a microarchitectural issue left with the vectorization but the spilling
obscures the look quite a bit :/

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]