[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?

Thu Nov 16 08:13:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|                            |x86_64-*-* i?86-*-*
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The strange code is because we perform basic-block vectorization resulting in

  vect_cst__249 = {_251, _251, _251, _251, _334, _334, _334, _334, _417, _417,
_417, _417, _48, _48, _48, _48};
  MEM[(unsigned int *)&tmp] = vect_cst__249;
  _186 = tmp[0][0];
  _185 = tmp[1][0];
...

which for some reason is deemed profitable:

t.c:32:12: note: Cost model analysis:
  Vector inside of basic block cost: 24
  Vector prologue cost: 64
  Vector epilogue cost: 0
  Scalar cost of basic block: 192
t.c:32:12: note: Basic block will be vectorized using SLP

what is odd is that the single vector store is costed 24 while the 16 scalar
int stores are costed 192.  The vector build from scalar costs 64.

I guess Honzas cost-model tweaks might have gone wrong here or we're hitting an
oddity in the SLP costing.

Even if it looks strange maybe the sequence _is_ profitable?

The second loop would be vectorized if 'sum' was unsigned.