[Bug target/97127] FMA3 code transformation leads to slowdown on Skylake

Fri Sep 25 13:21:27 GMT 2020

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127

--- Comment #15 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Hongtao.liu from comment #14)
> > Still I don't understand why compiler does not compare the cost of full loop
> > body after combining to the cost before combining and does not come to
> > conclusion that combining increased the cost.
> 
> As Richard says, GCC does not model CPU pipelines in such detail.

I don't understand what "details" you have in mind.
The costs of instructions that you quoted above looks fine. 
But for reason, I don't understand, compiler had chosen more costly "combined"
code sequence over less costly, according to its own cost model,  "RISCy"
sequence.