[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

Tue Nov 28 18:14:00 GMT 2017

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #26 from Jan Hubicka <hubicka at ucw dot cz> ---
On you matrix benchmarks I get:

  Vector inside of loop cost: 44
  Vector prologue cost: 12
  Vector epilogue cost: 0
  Scalar iteration cost: 40
  Scalar outside cost: 0
  Vector outside cost: 12
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
mult.c:15:7: note:   Runtime profitability threshold = 4
mult.c:15:7: note:   Static estimate profitability threshold = 4

  Vector inside of loop cost: 2428
  Vector prologue cost: 4
  Vector epilogue cost: 0
  Scalar iteration cost: 2428
  Scalar outside cost: 0
  Vector outside cost: 4
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
mult.c:30:7: note:   Runtime profitability threshold = 4
mult.c:30:7: note:   Static estimate profitability threshold = 4

for 128bit vectorization and for 256bit

  Vector inside of loop cost: 88
  Vector prologue cost: 24
  Vector epilogue cost: 0
  Scalar iteration cost: 40
  Scalar outside cost: 0
  Vector outside cost: 24
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
mult.c:15:7: note:   Runtime profitability threshold = 8
mult.c:15:7: note:   Static estimate profitability threshold = 8

  Vector inside of loop cost: 6472
  Vector prologue cost: 8
  Vector epilogue cost: 0
  Scalar iteration cost: 2428
  Scalar outside cost: 0
  Vector outside cost: 8
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1
mult.c:30:7: note:   Runtime profitability threshold = 8
mult.c:30:7: note:   Static estimate profitability threshold = 8

So if vectorizer knew to preffer bigger vector sizes when cost is about double,
it would vectoriye first loop to
256 as expected.