[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors
hubicka at ucw dot cz
gcc-bugzilla@gcc.gnu.org
Tue Nov 28 18:14:00 GMT 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616
--- Comment #26 from Jan Hubicka <hubicka at ucw dot cz> ---
On you matrix benchmarks I get:
Vector inside of loop cost: 44
Vector prologue cost: 12
Vector epilogue cost: 0
Scalar iteration cost: 40
Scalar outside cost: 0
Vector outside cost: 12
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 1
mult.c:15:7: note: Runtime profitability threshold = 4
mult.c:15:7: note: Static estimate profitability threshold = 4
Vector inside of loop cost: 2428
Vector prologue cost: 4
Vector epilogue cost: 0
Scalar iteration cost: 2428
Scalar outside cost: 0
Vector outside cost: 4
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 1
mult.c:30:7: note: Runtime profitability threshold = 4
mult.c:30:7: note: Static estimate profitability threshold = 4
for 128bit vectorization and for 256bit
Vector inside of loop cost: 88
Vector prologue cost: 24
Vector epilogue cost: 0
Scalar iteration cost: 40
Scalar outside cost: 0
Vector outside cost: 24
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 1
mult.c:15:7: note: Runtime profitability threshold = 8
mult.c:15:7: note: Static estimate profitability threshold = 8
Vector inside of loop cost: 6472
Vector prologue cost: 8
Vector epilogue cost: 0
Scalar iteration cost: 2428
Scalar outside cost: 0
Vector outside cost: 8
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 1
mult.c:30:7: note: Runtime profitability threshold = 8
mult.c:30:7: note: Static estimate profitability threshold = 8
So if vectorizer knew to preffer bigger vector sizes when cost is about double,
it would vectoriye first loop to
256 as expected.
More information about the Gcc-bugs
mailing list