Created attachment 23844 [details] testcase Output: $ gcc -m32 -O -fschedule-insns2 -fsched2-use-superblocks -fno-tree-loop-ivcanon -ftree-vrp --param max-predicted-iterations=1 testcase.c $ ./a.out Aborted Aborts with param value <=3, works with 4 and above. 4 is the number of iterations the loop has, so it might be possible to construct a testcase that would fail with default param value. At the assembly level, the problem seems to be mixing of x87 stack and MMX register usage: .L4: fld DWORD PTR [esp+64+eax*4] # MEM[symbol: out, index: D.6050_143, step: 4, offset: 0B] add eax, 1 # i, mov DWORD PTR [esp+60], eax #, i fild DWORD PTR [esp+60] # fxch st(1) # fucomip st, st(1) #, fstp st(0) # jp .L8 #, je .L12 #, .L8: call abort # .L12: pxor mm0, mm0 # tmp168 movq mm1, QWORD PTR [esp+48] # tmp167, %sfp xorps xmm0, xmm0 # tmp175 punpcklbw mm1, mm0 # tmp167, tmp168 pxor mm0, mm0 # tmp171 cmp eax, 4 # i, jne .L4 #, In this loop, both register classes are used without the (f)emms instruction. With param value >= 4, the "cmp eax, 4 ; jne .L4" check is moved just after .L12, so MMX instructions are not used.
Another, different set of flags to reproduce the failure: $ gcc -O -fsched2-use-superblocks -fschedule-insns2 --param max-jump-thread-duplication-stmts=1 -m32 testcase.c $ ./a.out Aborted In this case, x87 and MMX and SSE instructions are mixed in the code: ... .L8: pxor mm0, mm0 # tmp148 movq mm1, QWORD PTR [esp+8] # tmp147, %sfp xorps xmm0, xmm0 # tmp155 punpcklbw mm1, mm0 # tmp147, tmp148 pxor mm0, mm0 # tmp151 fld DWORD PTR .LC1 # fld DWORD PTR [esp+20] # out fucomip st, st(1) #, fstp st(0) # fld DWORD PTR .LC2 # jp .L10 #, movq mm2, mm1 # tmp150, D.6008 punpcklwd mm2, mm0 # tmp150, tmp151 fld DWORD PTR [esp+24] # out punpckhwd mm1, mm0 # tmp152, tmp151 jne .L11 #, ...