Bug 48397 - gcc.target/i386/pr46419.c FAILs with -fsched2-use-superblocks - mixing FPU and MMX code without emms
Summary: gcc.target/i386/pr46419.c FAILs with -fsched2-use-superblocks - mixing FPU an...
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Keywords: wrong-code
Depends on:
Reported: 2011-03-31 23:32 UTC by Zdenek Sojka
Modified: 2011-04-03 17:30 UTC (History)
0 users

See Also:
Host: x86_64-pc-linux-gnu
Target: i686-pc-linux-gnu
Known to work:
Known to fail: 4.4.6, 4.5.3, 4.6.1, 4.7.0
Last reconfirmed:

testcase (305 bytes, text/plain)
2011-03-31 23:32 UTC, Zdenek Sojka

Note You need to log in before you can comment on or make changes to this bug.
Description Zdenek Sojka 2011-03-31 23:32:52 UTC
Created attachment 23844 [details]

$ gcc -m32 -O -fschedule-insns2 -fsched2-use-superblocks -fno-tree-loop-ivcanon -ftree-vrp --param max-predicted-iterations=1 testcase.c
$ ./a.out 

Aborts with param value <=3, works with 4 and above. 4 is the number of iterations the loop has, so it might be possible to construct a testcase that would fail with default param value.

At the assembly level, the problem seems to be mixing of x87 stack and MMX register usage:

	fld	DWORD PTR [esp+64+eax*4]	# MEM[symbol: out, index: D.6050_143, step: 4, offset: 0B]
	add	eax, 1	# i,
	mov	DWORD PTR [esp+60], eax	#, i
	fild	DWORD PTR [esp+60]	#
	fxch	st(1)	#
	fucomip	st, st(1)	#,
	fstp	st(0)	#
	jp	.L8	#,
	je	.L12	#,
	call	abort	#
	pxor	mm0, mm0	# tmp168
	movq	mm1, QWORD PTR [esp+48]	# tmp167, %sfp
	xorps	xmm0, xmm0	# tmp175
	punpcklbw	mm1, mm0	# tmp167, tmp168
	pxor	mm0, mm0	# tmp171
	cmp	eax, 4	# i,
	jne	.L4	#,

In this loop, both register classes are used without the (f)emms instruction.
With param value >= 4, the "cmp	eax, 4 ; jne .L4" check is moved just after .L12, so MMX instructions are not used.
Comment 1 Zdenek Sojka 2011-04-03 17:30:28 UTC
Another, different set of flags to reproduce the failure:
$ gcc -O -fsched2-use-superblocks -fschedule-insns2 --param max-jump-thread-duplication-stmts=1 -m32 testcase.c
$ ./a.out 

In this case, x87 and MMX and SSE instructions are mixed in the code:

	pxor	mm0, mm0	# tmp148
	movq	mm1, QWORD PTR [esp+8]	# tmp147, %sfp
	xorps	xmm0, xmm0	# tmp155
	punpcklbw	mm1, mm0	# tmp147, tmp148
	pxor	mm0, mm0	# tmp151
	fld	DWORD PTR .LC1	#
	fld	DWORD PTR [esp+20]	# out
	fucomip	st, st(1)	#,
	fstp	st(0)	#
	fld	DWORD PTR .LC2	#
	jp	.L10	#,
	movq	mm2, mm1	# tmp150, D.6008
	punpcklwd	mm2, mm0	# tmp150, tmp151
	fld	DWORD PTR [esp+24]	# out
	punpckhwd	mm1, mm0	# tmp152, tmp151
	jne	.L11	#,