[Bug middle-end/37150] basic-block vectorization misses some unrolled loops

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Fri Nov 4 11:27:00 GMT 2016


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, so fixing the accounting to disregard obviously dead loads gets us to

t.f90:158:0: note: Cost model analysis:
  Vector inside of basic block cost: 1224
  Vector prologue cost: 0
  Vector epilogue cost: 0
  Scalar cost of basic block: 616
t.f90:158:0: note: not vectorized: vectorization is not profitable.

that still doesn't account for the redundant ones... (we still emit those
so we conservatively assume no CSE here).  I suppose the "simple" way
of costing permutation might be the real issue here though.

Permutations like { 58, 58, 58, 58 } are also vectorized badly
(and costed accordingly).  Likewise { 4, 5, 4, 5 } is costed as
permutation.

Not counting non-permutations improves things to

t.f90:158:0: note: Cost model analysis:
  Vector inside of basic block cost: 1080
  Vector prologue cost: 0
  Vector epilogue cost: 0
  Scalar cost of basic block: 616
t.f90:158:0: note: not vectorized: vectorization is not profitable.

So there is room for improvement but this was the "easy" parts (for the
rest also more analysis is required).  Likely there's some CSE inbetween
the SLP instances involved.


More information about the Gcc-bugs mailing list