Bug 22076 - Strange code for MMX register moves
Summary: Strange code for MMX register moves
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.1.0
: P2 normal
Target Milestone: 4.4.0
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords: missed-optimization, ssemmx
: 23660 34256 (view as bug list)
Depends on:
Blocks: 22152 24073 25277
  Show dependency treegraph
 
Reported: 2005-06-15 12:39 UTC by Uroš Bizjak
Modified: 2008-11-16 00:14 UTC (History)
4 users (show)

See Also:
Host:
Target: i?86-*-*, x86_64-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed: 2006-10-22 20:52:01


Attachments
assembly file generated for gcc.target/i386/pr22076.c at -m64 on i686-apple-darwin9 (461 bytes, text/plain)
2008-11-16 00:14 UTC, Jack Howarth
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Uroš Bizjak 2005-06-15 12:39:54 UTC
This testcase:
#include <mmintrin.h>

__m64 test() {
   __m64 a;

   return a;
}

results in quite strange code when compiled with '-O2 -mmmx':

test:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	movl	%eax, -8(%ebp)   <<<<
	movl	%edx, -4(%ebp)   <<<<
	movq	-8(%ebp), %mm0
	leave
	ret

If xmm registers are used (change __m64 into __m128), asm code is OK.
It looks there are some problems with register costs, from _.22.lreg:

Pass 0

  Register 58 costs: AD_REGS:13000 Q_REGS:13000 NON_Q_REGS:13000 
INDEX_REGS:13000 LEGACY_REGS:13000 GENERAL_REGS:13000 MMX_REGS:24000 
FLOAT_INT_REGS:30000 INT_SSE_REGS:30000 FLOAT_INT_SSE_REGS:30000 ALL_REGS:30000 
MEM:20000

  Register 58 pref GENERAL_REGS or none


Pass 1

  Register 58 costs: AD_REGS:13000 Q_REGS:13000 NON_Q_REGS:13000 
INDEX_REGS:13000 LEGACY_REGS:13000 GENERAL_REGS:13000 MMX_REGS:24000 
FLOAT_INT_REGS:30000 INT_SSE_REGS:30000 FLOAT_INT_SSE_REGS:30000 ALL_REGS:30000 
MEM:20000

...

(insn:HI 16 28 22 1 (set (reg/i:V2SI 29 mm0 [ <result> ])
        (reg/v:V2SI 58 [ a ])) 768 {*movv2si_internal} (nil)
    (expr_list:REG_DEAD (reg/v:V2SI 58 [ a ])
        (nil)))
...

The instructions, marked with '<<<<' are then produced by reload.
Comment 1 Falk Hueffner 2005-06-15 12:56:54 UTC
(In reply to comment #0)
> This testcase:
> #include <mmintrin.h>
> 
> __m64 test() {
>    __m64 a;
> 
>    return a;
> }

Well, this is invalid code, so it's not really important what the code
looks like. If it also occurs in valid code, it might be related to PR 7061.
Comment 2 Andrew Pinski 2005-06-15 17:35:44 UTC
a is uninitialized so what do you expect?
Comment 3 Uroš Bizjak 2005-06-21 12:04:39 UTC
New testcase (everything is initialized this time):

--cut here--
#include <mmintrin.h>

__v8qi test ()
{
  __v8qi mm0 = {1,2,3,4,5,6,7,8};
  __v8qi mm1 = {11,22,33,44,55,66,77,88};
  volatile __m64 x;

  x = _mm_add_pi8 (mm0, mm1);

  return x;
}
--cut here--


Pass 0

  Register 67 costs: AD_REGS:4000 Q_REGS:4000 NON_Q_REGS:4000 INDEX_REGS:4000 
LEGACY_REGS:4000 GENERAL_REGS:4000 MMX_REGS:46000 INT_SSE_REGS:38000 MEM:16000

  Register 67 pref GENERAL_REGS or none


Pass 1

  Register 67 costs: AD_REGS:4000 Q_REGS:4000 NON_Q_REGS:4000 INDEX_REGS:4000 
LEGACY_REGS:4000 GENERAL_REGS:4000 MMX_REGS:46000 INT_SSE_REGS:38000 MEM:16000

69 registers.

...

(insn:HI 18 45 22 1 (set (reg:V8QI 67)
        (mem/u/i:V8QI (symbol_ref/u:SI ("*.LC2") [flags 0x2]) [0 S8 A64])) 766 
{*movv8qi_internal} (nil)
    (expr_list:REG_EQUIV (const_vector:V8QI [
                (const_int 12 [0xc])
                (const_int 24 [0x18])
                (const_int 36 [0x24])
                (const_int 48 [0x30])
                (const_int 60 [0x3c])
                (const_int 72 [0x48])
                (const_int 84 [0x54])
                (const_int 96 [0x60])
            ])
        (nil)))

...

test:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$24, %esp
	movl	$807671820, %eax
	movl	$1616136252, %edx
	movl	%eax, -8(%ebp)
	movl	%edx, -4(%ebp)
	movl	-8(%ebp), %eax
	movl	-4(%ebp), %edx
	movl	%eax, -24(%ebp)
	movl	%edx, -20(%ebp)
	movq	-24(%ebp), %mm1
	leave
	movq	%mm1, %mm0
	ret

Comment 4 Andrew Pinski 2005-06-21 13:06:50 UTC
I think this is more related to PR 14552 which was shown by me that we regressed because we did not 
output emms at all before so not emmiting mmx instructions without use of the functions in 
mmintrin.h
Comment 5 Andrew Pinski 2005-06-22 20:27:37 UTC
Confirmed.
Comment 6 Andrew Pinski 2005-08-31 19:18:14 UTC
*** Bug 23660 has been marked as a duplicate of this bug. ***
Comment 7 Wolfgang Bangerth 2005-08-31 20:34:04 UTC
In the discussion on the duplicate PR 23660, rth explained part of this here: 
  http://gcc.gnu.org/ml/gcc/2005-08/msg00934.html 
 
W. 
Comment 8 Uroš Bizjak 2007-12-10 08:40:41 UTC
*** Bug 34256 has been marked as a duplicate of this bug. ***
Comment 9 uros 2008-02-23 15:24:41 UTC
Subject: Bug 22076

Author: uros
Date: Sat Feb 23 15:24:02 2008
New Revision: 132572

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132572
Log:
        PR target/22076
        PR target/34256

        * config/i386/mmx.md (*mov<mode>_internal_rex64): Use "!y" to
        prevent reload from using MMX registers.
        (*mov<mode>_internal): Ditto.
        (*movv2sf_internal_rex64): Ditto.
        (*movv2sf_internal): Ditto.

testsuite/ChangeLog:

        PR target/22076
        PR target/34256
        * gcc.target/i386/pr22076.c: New test.
        * gcc.target/i386/pr34256.c: New test.
        * gcc.target/i386/vecinit-5.c: New test.
        * gcc.target/i386/vecinit-6.c: New test.
        * gcc.target/i386/vecinit-[1-4].c: Check that no MMX register is used.

        * g++.dg/compat/struct-layout-1.h: Do not include <mmintrin.h> and
        <xmmintrin.h>, define __m64 and __m128 directly.
        * g++.dg/compat/struct-layout-1_generate.c: Add -mno-mmx for x86.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr22076.c
    trunk/gcc/testsuite/gcc.target/i386/pr34256.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-5.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-6.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/mmx.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/g++.dg/compat/struct-layout-1.h
    trunk/gcc/testsuite/g++.dg/compat/struct-layout-1_generate.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-1.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-2.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-3.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-4.c

Comment 10 Uroš Bizjak 2008-02-23 15:33:28 UTC
Fixed.
Comment 11 Uroš Bizjak 2008-03-08 07:23:55 UTC
Fixed for real.
Comment 12 Jack Howarth 2008-11-16 00:13:19 UTC
The gcc.target/i386/pr22076.c test case fails for i686-apple-darwin9 at -m64 as follows...

Executing on host: /sw/src/fink.build/gcc44-4.3.999-20081115/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20081115/darwin_objdi
r/gcc/ /sw/src/fink.build/gcc44-4.3.999-20081115/gcc-4.4-20081115/gcc/testsuite/gcc.target/i386/pr22076.c   -O2 -flax-vector-conversions -mm
mx -S  -m64 -o pr22076.s    (timeout = 300)
PASS: gcc.target/i386/pr22076.c (test for excess errors)
FAIL: gcc.target/i386/pr22076.c scan-assembler-times movq 3

Comment 13 Jack Howarth 2008-11-16 00:14:17 UTC
Created attachment 16692 [details]
assembly file generated for gcc.target/i386/pr22076.c at -m64 on i686-apple-darwin9