Created attachment 42299 [details]
The attached example is a simple matrix multiplication. With -O3 or -O2 -ftree-slp-vectorize the basic-block is not vectorized.
Oddly, with -Os -ftree-slp-vectorize it is.
Created attachment 42300 [details]
Assembler output with -O3
Created attachment 42301 [details]
Assembler output with -Os -ftree-slp-vectorize
Note it appears the fact it can do it at all in -Os is new in gcc 7