[Bug target/83008] [performance] Is it better to avoid extra instructions in data passing between loops?
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Nov 16 08:13:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target| |x86_64-*-* i?86-*-*
CC| |hubicka at gcc dot gnu.org,
| |rguenth at gcc dot gnu.org
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The strange code is because we perform basic-block vectorization resulting in
vect_cst__249 = {_251, _251, _251, _251, _334, _334, _334, _334, _417, _417,
_417, _417, _48, _48, _48, _48};
MEM[(unsigned int *)&tmp] = vect_cst__249;
_186 = tmp[0][0];
_185 = tmp[1][0];
...
which for some reason is deemed profitable:
t.c:32:12: note: Cost model analysis:
Vector inside of basic block cost: 24
Vector prologue cost: 64
Vector epilogue cost: 0
Scalar cost of basic block: 192
t.c:32:12: note: Basic block will be vectorized using SLP
what is odd is that the single vector store is costed 24 while the 16 scalar
int stores are costed 192. The vector build from scalar costs 64.
I guess Honzas cost-model tweaks might have gone wrong here or we're hitting an
oddity in the SLP costing.
Even if it looks strange maybe the sequence _is_ profitable?
The second loop would be vectorized if 'sum' was unsigned.
More information about the Gcc-bugs
mailing list