Created attachment 23355 [details]
The attached testcase fails when compiled with -O2 or -O3 optimizations, but works with -O1.
I'm actually not precisely sure how this code is expected to behave because intrinsics are x86 architecture specific and C standard can't be used as a reference. But my guess is that if the optimizer would not be allowed to arbitrarily move code across _mm_empty() boundary, then the problem would disappear.
Confirmed, the scheduler does not know that the mmx instructions touch the x87 registers.
Note this is a target specific issue and not a generic scheduler issue as the target does not communicate to the scheduler that the x87 registers and the MMX register overlap.