19161 – No emms or femms emitted between MMX and FP instructions

Bug 19161 - No emms or femms emitted between MMX and FP instructions

Summary: No emms or femms emitted between MMX and FP instructions

Status:	RESOLVED WONTFIX

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	4.0.0

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:	http://gcc.gnu.org/ml/gcc-patches/200...
Keywords:	patch, ssemmx, wrong-code

Duplicates (2):	14801 17415 (view as bug list)
Depends on:
Blocks:	14552 19530 22152 23376
	Show dependency tree / graph

Reported:	2004-12-26 20:16 UTC by Richard Henderson
Modified:	2008-03-19 10:38 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:	i386-- x86_64--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2005-08-22 20:32:55

Attachments
testcase for c#19 (18.71 KB, application/octet-stream) 2005-09-22 13:12 UTC, Pawel Sikora	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Henderson 2004-12-26 20:16:29 UTC

Curse Intel and their modal register sets.

Today I caught the x86-64 compiler using %mm0 just because the datatype it
wanted to move fit nicely in that register.  Except that after an MMX register
is touched, one must leave MMX mode before (1) the nest FPU instruction or
(2) at a call boundary, since the abi requires we be in FPU mode.

Not bothering to add a test case, since I'm planning to hack around this 
specific example with changed register preferences, but the point remains
that we have nothing in place to prevent the badness.

I suspect that what we'll need for a complete solution may include dynamic
register class letters.  At some point, perhaps during rtl expansion, we
record whether or not there are any *operations* that require either MMX
or FPU.  If we have MMX but not FPU operations, we set 'f' to NOREGS; if
we have FPU but not MMX, we set 'y' to NOREGS.  If we have both, then then
we'll need an optimize_mode_switching pass to swap between modes.  The
exceedingly tricky bit there will be tricking reload into not making both
kinds of registers live behind our backs.

Comment 1 Andrew Pinski 2004-12-26 21:20:02 UTC

Confirmed, there is a dup of this filed already.

Comment 2 Andrew Pinski 2004-12-27 00:24:02 UTC

*** Bug 14801 has been marked as a duplicate of this bug. ***

Comment 3 Andrew Pinski 2004-12-27 00:26:16 UTC

Found the bug finnally, see PR 14801 for an example.

Also PR 16872 is another example.

Comment 4 Richard Henderson 2005-01-18 09:52:54 UTC

*** Bug 17415 has been marked as a duplicate of this bug. ***

Comment 5 Steven Bosscher 2005-06-21 11:20:24 UTC

Uros, also for you it seems... 
(http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01724.html)

Comment 6 Pawel Sikora 2005-06-24 14:03:39 UTC

(In reply to comment #5) 
> Uros, also for you it seems...  
> (http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01724.html)  
 
with this patch I get an ice on amd64 bootstrap: 
   
(..)   
-c ../../gcc/unwind-dw2.c -o libgcc/./unwind-dw2.o 
 
In file included from ../../gcc/unwind-dw2.c:256:   
../../gcc/config/i386/linux-unwind.h:   
In function 'x86_64_fallback_frame_state':  
../../gcc/config/i386/linux-unwind.h:55: warning: dereferencing type-punned  
                                   pointer will break strict-aliasing rules   
../../gcc/unwind.inc: In function '_Unwind_ForcedUnwind':  
../../gcc/unwind.inc:215: internal compiler error: in create_pre_exit,  
                                                   at mode-switching.c:350

Comment 7 Uroš Bizjak 2005-06-24 20:33:07 UTC

(in reply to comment #6)

> ../../gcc/unwind.inc: In function '_Unwind_ForcedUnwind':  
> ../../gcc/unwind.inc:215: internal compiler error: in create_pre_exit,  
>                                                    at mode-switching.c:350 

This is a known problem, with a hack to mode-switching.c at
http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01434.html.

As this will fix an ice you got, the real problem is in fact, that this function
is (like __builtin_apply case) trying to handle returned %mm register together
with %st and this will confuse mode switching in the exit block.

Please, could you try to apply the mode-switching.c part of the patch and see if
it fix an ice for you. However, I think that __builtin_apply should process only
an x87 output register, and should be limited only to functions that return in
FPU_X87 mode.

Comment 8 Pawel Sikora 2005-06-25 06:26:38 UTC

(In reply to comment #7) 
> (in reply to comment #6) 
>  
> This is a known problem, with a hack to mode-switching.c at 
> http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01434.html. 
>  
> Please, could you try to apply the mode-switching.c part of the patch 
> and see if it fix an ice for you. 
 
with this hack bootstrap still ices. 
 
../../gcc/unwind.inc: In function '_Unwind_ForcedUnwind': 
../../gcc/unwind.inc:215: internal compiler error: in create_pre_exit, 
                                   at mode-switching.c:362

Comment 9 Uroš Bizjak 2005-06-25 07:40:02 UTC

(In reply to comment #8)

> > This is a known problem, with a hack to mode-switching.c at 
> > http://gcc.gnu.org/ml/gcc-patches/2005-06/msg01434.html. 
> >  
> > Please, could you try to apply the mode-switching.c part of the patch 
> > and see if it fix an ice for you. 
>  
> with this hack bootstrap still ices. 
>  
> ../../gcc/unwind.inc: In function '_Unwind_ForcedUnwind': 
> ../../gcc/unwind.inc:215: internal compiler error: in create_pre_exit, 
>                                    at mode-switching.c:362 

It was a hack anyway :---(

Thanks for the report, I'll try to find a proper fix in the next week.

(BTW: It fails for x86-64, because this target enables mmx by default.)

Comment 10 Uroš Bizjak 2005-07-21 08:46:57 UTC

(In reply to comment #6)

> with this patch I get an ice on amd64 bootstrap: 
> In file included from ../../gcc/unwind-dw2.c:256:   
> ../../gcc/config/i386/linux-unwind.h:   
> In function 'x86_64_fallback_frame_state':  
> ../../gcc/config/i386/linux-unwind.h:55: warning: dereferencing type-punned  
>                                    pointer will break strict-aliasing rules   
> ../../gcc/unwind.inc: In function '_Unwind_ForcedUnwind':  
> ../../gcc/unwind.inc:215: internal compiler error: in create_pre_exit,  
>                                                    at mode-switching.c:350 
>  

Pawel, could you check the patch at http://gcc.gnu.org/ml/gcc-patches/2005-
07/msg01128.html if it fixes bootstrap problems on AMD64? Patch works for me 
with BOOT_CFLAGS="-02 -msse2" on pentium4, and this is as far as I can test...

Comment 11 Uroš Bizjak 2005-07-22 09:33:53 UTC

Whee, it looks that x86_64 breakage has gone. I have succesfully compiled 
unwind-dw2.c with patched x86_64 crosscompiler.

Comment 12 Pawel Sikora 2005-07-23 15:42:12 UTC

(In reply to comment #10)   
> (In reply to comment #6)   
>    
> > with this patch I get an ice on amd64 bootstrap:    
> > In file included from ../../gcc/unwind-dw2.c:256:      
> > ../../gcc/config/i386/linux-unwind.h:      
> > In function 'x86_64_fallback_frame_state':     
> > ../../gcc/config/i386/linux-unwind.h:55: warning: dereferencing   
type-punned     
> >                                    pointer will break strict-aliasing   
rules      
> > ../../gcc/unwind.inc: In function '_Unwind_ForcedUnwind':     
> > ../../gcc/unwind.inc:215: internal compiler error: in create_pre_exit,     
> >                                                    at mode-switching.c:350    
> >     
>    
> Pawel, could you check the patch at http://gcc.gnu.org/ml/gcc-patches/2005-   
> 07/msg01128.html if it fixes bootstrap problems on AMD64? Patch works for me    
> with BOOT_CFLAGS="-02 -msse2" on pentium4, and this is as far as I can   
test...   
>    
   
I check this right now :)  I was busy with PR22584 earlier :|

Comment 13 Pawel Sikora 2005-07-23 17:07:20 UTC

current mainline bootstrap still fails. 
 
(...) 
./xgcc -B./ -B/usr/x86_64-pld-linux/bin/ 
-isystem /usr/x86_64-pld-linux/include 
-isystem /usr/x86_64-pld-linux/sys-include 
-L/home/users/pluto/rpm/BUILD/gcc-4.1-20050723T1611UTC/obj-x86_64-pld-linux/gcc/../ld 
-O2  -DIN_GCC    -W -Wall -Wwrite-strings -Wstrict-prototypes 
-Wmissing-prototypes -Wold-style-definition  -isystem ./include  -fPIC  
-DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED  -I. -I. -I../../gcc 
-I../../gcc/. -I../../gcc/../include -I../../gcc/../libcpp/include  
-fvisibility=hidden -DHIDE_EXPORTS -fexceptions -c ../../gcc/unwind-dw2.c -o 
libgcc/./unwind-dw2.o 
../../gcc/unwind.inc: In function '_Unwind_ForcedUnwind': 
../../gcc/unwind.inc:215: internal compiler error: in create_pre_exit, at 
mode-switching.c:352 
(...) 
make[3]: *** [libgcc/./unwind-dw2.o] Error 1 
make[3]: Leaving directory 
`/home/users/pluto/rpm/BUILD/gcc-4.1-20050723T1611UTC/obj-x86_64-pld-linux/gcc' 
make[2]: *** [stmp-multilib] Error 2 
make[2]: Leaving directory 
`/home/users/pluto/rpm/BUILD/gcc-4.1-20050723T1611UTC/obj-x86_64-pld-linux/gcc' 
make[1]: *** [stage1_build] Error 2 
make[1]: Leaving directory 
`/home/users/pluto/rpm/BUILD/gcc-4.1-20050723T1611UTC/obj-x86_64-pld-linux/gcc' 
make: *** [bootstrap] Error 2

Comment 14 Uroš Bizjak 2005-08-19 10:19:32 UTC

(In reply to comment #13)
> current mainline bootstrap still fails. 

> ../../gcc/unwind.inc: In function '_Unwind_ForcedUnwind': 
> ../../gcc/unwind.inc:215: internal compiler error: in create_pre_exit, at 
> mode-switching.c:352 

The patch at http://gcc.gnu.org/ml/gcc-patches/2005-08/msg01142.html fixes this 
problem.

Comment 15 Richard Henderson 2005-08-22 20:32:55 UTC

Doing the code review.  I've got a local patch for the create_pre_exit ice.
I'm going to work to see this in 4.1.

Comment 16 Richard Henderson 2005-08-23 20:48:16 UTC

So, I fixed another case in which we could die in create_pre_exit having
to do with complex return values.  But past that, there are failures that
are completely within optimize_mode_switching, e.g. execute/20050604-1.c.

$ ./cc1 -m32 -march=pentium4 z.c
 foo
z.c: In function ‘foo’:
z.c:28: error: unable to find a register to spill in class ‘MMX_REGS’
z.c:28: error: this is the insn:
(insn 14 63 15 2 (set (reg:V4HI 61 [ D.1620 ])
        (mem/s/j:V4HI (symbol_ref:SI ("u") <var_decl 0x2aaaadaff160 u>)
                      [0 u.v+0 S8 A64])) 994 {*movv4hi_internal} (nil)
    (nil))

The problem is that we have a CFG like
     +--+
     v  |
  1->2->3->4
and we place the efpu insn in block 2, but the emms insn in block 4.

Aside from being Less Than Ideal, this results in BOTH mmx and fpu
registers live around the loop, which means we can't allocate anything.

Uros, you should bootstrap i386 with --with-arch=foo, where foo is 
whatever machine you have that supports at least mmx.  Otherwise, you're
not actually testing this new code on i386 except for the few test
cases that force an -march or -mmx option.

I'll keep looking at it for a bit to see if its something simple, but
we're not going to overhaul optimize_mode_switching for 4.1 if it's
something complicated.

Comment 17 Richard Henderson 2005-08-23 21:30:31 UTC

Actually, I lied about the CFG.  It's actually 1->3 with 2-3 still forming
the loop.  So LCM did the right thing, technically: for the case in which
the loop trip count is zero, we avoid the efpu insn.  

The problem is, the model we have wrt efpu/emms requires that they be used
in balanced pairs.  And, really, we'd prefer that these insns be pushed out
of loops when possible.

But I'm not sure how to address this at the moment.

Comment 18 Uroš Bizjak 2005-08-24 14:33:20 UTC

  There is another bug in ix86_mode_needed() that causes timeouts for
pr20314-1.c. The problem is in asm operands parsing code that gets into 
infinite loop. The correct code should increase variable c instead of cc when 
comma is found:

config/i386/i386.c (ix86_mode_needed):

	      ...
	      for (i = 0; i < noperands; i++)
		{
		  const char *c = constraints[i];
		  enum reg_class class;

		  if (c[0] == '%')
		    c++;
		  if (ISDIGIT ((unsigned char) c[0]) && c[1] == '\0')
		    c = constraints[c[0] - '0'];

		  while (*c)
		    {
		      char cc = *c;
		      int len;
		      switch (cc)
			{
			case ',':
			  c++;            <<<<< here!!
			  continue;
			case '=':
			case '+':
			case '*':
			case '%':
			case '!':
			case '#':
			case '&':
			case '?':
			  break;
	      ...

Regarding emms/efpu instructions in loop: I have made some experiments by 
inserting mode switching insn before NOTE_INSN_LOOP_BEGIN. The failure in 
20050604-1.c is fixed if this mode is set to FPU_MODE_MMX.

Comment 19 Pawel Sikora 2005-09-22 13:10:47 UTC

Uros,  
The mode switching patch ICEs current mainline on ix86. 
gcc fbmmx.i -msse -O0,-O1 fails with different insn-errors. 
 
[ -msse -O0 ] 
 
fbmmx.c: In function &#8216;_cairo_pixman_composite_src_add_8000x8000mmx&#8217;: 
fbmmx.c:2169: error: unable to find a register to spill in class &#8216;MMX_REGS&#8217; 
fbmmx.c:2169: error: this is the insn: 
(insn 174 172 175 7 (set (reg:V8QI 59 [ D.8903 ]) 
        (mem/c/i:V8QI (plus:SI (reg/f:SI 20 frame) 
                (const_int -16 [0xfffffff0])) [0 __m2+0 S8 A32])) 776 
{*movv8qi_internal} (nil) 
    (nil)) 
fbmmx.c:2169: internal compiler error: in spill_failure, at reload1.c:1890 
 
 
[ -msse -O1 ] 
 
fbmmx.c: In function &#8216;_cairo_pixman_composite_src_add_8000x8000mmx&#8217;: 
fbmmx.c:2169: error: unable to find a register to spill in class &#8216;MMX_REGS&#8217; 
fbmmx.c:2169: error: this is the insn: 
(insn 166 165 169 9 (set (reg:V8QI 167) 
        (us_plus:V8QI (mem:V8QI (reg/v/f:SI 4 si [orig:120 src ] [120]) [0 S8 
A64]) 
            (mem:V8QI (reg/v/f:SI 2 cx [orig:122 dst ] [122]) [0 S8 A64]))) 
812 {mmx_usaddv8qi3} (nil) 
    (nil)) 
fbmmx.c:2169: internal compiler error: in spill_failure, at reload1.c:1890

Comment 20 Pawel Sikora 2005-09-22 13:12:04 UTC

Created attachment 9791 [details]
testcase for c#19

Comment 21 Richard Henderson 2005-11-02 06:39:03 UTC

I'm no longer actively working on this.

Comment 22 Richard Barnes 2008-01-04 20:05:52 UTC

I have seen this bug in the vector-2 testcases shipped in gcc/gcc/testsuite/gcc.dg/compat when compiling for i386 with -msse2.  In vector-2_y.c and vector-2_x.c we end up using both mmx and x87 registers in the same function without any intervening EMMS instruction.  This fails in 4.1.2, 3.4.6, and 3.3.1.

Comment 23 Uroš Bizjak 2008-03-19 10:38:27 UTC

As stated in comment #16 and #17, the LCM infrastructure doesn't support mode switching in the way that would be usable for emms. Additionally, there are MANY problems expected when sharing x87 and MMX registers (i.e. handling of uninitialized x87 registers at the beginning of the function - this is the reason we don't implement x87 register passing ABI).

Automatic MMX vectorization is not exactly a much usable feature nowadays (we have SSE that works quite well here). Due to recent changes in MMX register allocation area, excellent code is produced using MMX intrinsics, I'm closing this bug as WONTFIX.