Hi, Recently Richard fixed this http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49957 According to my measurements, fix for that bug caused (on Spec2006): For SandyBride CPU: * 410.bwaves degradation is -9.54% for peak32 * 410.bwaves degradation is -6.91% for base32 * 410.bwaves improvement is 1.00% for peak64 * 410.bwaves improvement is 0.91% 3or base64 For Corei7 CPU: * 410.bwaves degradation is -3.91% for peak32 * 410.bwaves degradation is -3.91% for base32 * 410.bwaves improvement is 1.94% for peak64 * 410.bwaves improvement is 3.23% 3or base64 For AMD (Phenom(tm) II X3 B75) CPU: * 410.bwaves degradation is -7.32% for peak32 * 410.bwaves degradation is -6.56% for base32 * 410.bwaves improvement is 2.01% for peak64 * 410.bwaves degradation is -1.34% 3or base64
Checkin URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=177368
Here is optset details: base=-static -O2 -ffast-math ("-m32 -msse2 -mfpmath=sse" if 32 bit mode) peak=-static -O3 -funroll-loops -ffast-math ("-m32 -msse2 -mfpmath=sse" if 32 bit mode) For SandyBridge: += "-mavx -march=corei7" For Core i7: += "-march=corei7" For AMD: += "-march=amdfam10" (not sure this is the best)
For 32bit only it seems. Supposedly a cost model issue, the register pressure will be higher and we have only half the number of SSE regs.
(In reply to comment #3) > For 32bit only it seems. Supposedly a cost model issue, the register pressure > will be higher and we have only half the number of SSE regs. Richard, what's wrong maybe with cost model? If you're increasing liverange and you have not as much registers (32-bit case), obviously register pressure will increase and degrade performance. But again, how it is connected with cost model?
On Tue, 27 Sep 2011, kirill.yukhin at intel dot com wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50480 > > --- Comment #4 from Yukhin Kirill <kirill.yukhin at intel dot com> 2011-09-27 08:31:35 UTC --- > (In reply to comment #3) > > For 32bit only it seems. Supposedly a cost model issue, the register pressure > > will be higher and we have only half the number of SSE regs. > > Richard, what's wrong maybe with cost model? If you're increasing liverange and > you have not as much registers (32-bit case), obviously register pressure will > increase and degrade performance. But again, how it is connected with cost > model? It's connected to the cost model not modeling the whole vectorized loop but only vectorized statements. So it can't possibly catch this case. I thought of moving more of the cost model details to the target by allowing the target to track the complete loop, like with void * targetm.vectorizer.cost_model_start_loop (struct loop *); targetm.vectorizer.cost_model_stmt (void *, gimple); unsigned targetm.vectorizer.cost_model_finish_loop (void *); where the latter would return a cost for the vectorized loop. We'd need that to model things like PPC having imbalanced resources for some kind of vectorizations as well (shift takes up much resources, so you need other stmts to compensate for it). Richard.
Created attachment 27206 [details] ivtops dump from subversion id 183934 (after regression)
(In reply to Michael Meissner from comment #6) > Created attachment 27206 [details] > ivtops dump from subversion id 183934 (after regression) Where are we supposed to be looking in this?