This testcase: #include <mmintrin.h> __m64 test() { __m64 a; return a; } results in quite strange code when compiled with '-O2 -mmmx': test: pushl %ebp movl %esp, %ebp subl $8, %esp movl %eax, -8(%ebp) <<<< movl %edx, -4(%ebp) <<<< movq -8(%ebp), %mm0 leave ret If xmm registers are used (change __m64 into __m128), asm code is OK. It looks there are some problems with register costs, from _.22.lreg: Pass 0 Register 58 costs: AD_REGS:13000 Q_REGS:13000 NON_Q_REGS:13000 INDEX_REGS:13000 LEGACY_REGS:13000 GENERAL_REGS:13000 MMX_REGS:24000 FLOAT_INT_REGS:30000 INT_SSE_REGS:30000 FLOAT_INT_SSE_REGS:30000 ALL_REGS:30000 MEM:20000 Register 58 pref GENERAL_REGS or none Pass 1 Register 58 costs: AD_REGS:13000 Q_REGS:13000 NON_Q_REGS:13000 INDEX_REGS:13000 LEGACY_REGS:13000 GENERAL_REGS:13000 MMX_REGS:24000 FLOAT_INT_REGS:30000 INT_SSE_REGS:30000 FLOAT_INT_SSE_REGS:30000 ALL_REGS:30000 MEM:20000 ... (insn:HI 16 28 22 1 (set (reg/i:V2SI 29 mm0 [ <result> ]) (reg/v:V2SI 58 [ a ])) 768 {*movv2si_internal} (nil) (expr_list:REG_DEAD (reg/v:V2SI 58 [ a ]) (nil))) ... The instructions, marked with '<<<<' are then produced by reload.
(In reply to comment #0) > This testcase: > #include <mmintrin.h> > > __m64 test() { > __m64 a; > > return a; > } Well, this is invalid code, so it's not really important what the code looks like. If it also occurs in valid code, it might be related to PR 7061.
a is uninitialized so what do you expect?
New testcase (everything is initialized this time): --cut here-- #include <mmintrin.h> __v8qi test () { __v8qi mm0 = {1,2,3,4,5,6,7,8}; __v8qi mm1 = {11,22,33,44,55,66,77,88}; volatile __m64 x; x = _mm_add_pi8 (mm0, mm1); return x; } --cut here-- Pass 0 Register 67 costs: AD_REGS:4000 Q_REGS:4000 NON_Q_REGS:4000 INDEX_REGS:4000 LEGACY_REGS:4000 GENERAL_REGS:4000 MMX_REGS:46000 INT_SSE_REGS:38000 MEM:16000 Register 67 pref GENERAL_REGS or none Pass 1 Register 67 costs: AD_REGS:4000 Q_REGS:4000 NON_Q_REGS:4000 INDEX_REGS:4000 LEGACY_REGS:4000 GENERAL_REGS:4000 MMX_REGS:46000 INT_SSE_REGS:38000 MEM:16000 69 registers. ... (insn:HI 18 45 22 1 (set (reg:V8QI 67) (mem/u/i:V8QI (symbol_ref/u:SI ("*.LC2") [flags 0x2]) [0 S8 A64])) 766 {*movv8qi_internal} (nil) (expr_list:REG_EQUIV (const_vector:V8QI [ (const_int 12 [0xc]) (const_int 24 [0x18]) (const_int 36 [0x24]) (const_int 48 [0x30]) (const_int 60 [0x3c]) (const_int 72 [0x48]) (const_int 84 [0x54]) (const_int 96 [0x60]) ]) (nil))) ... test: pushl %ebp movl %esp, %ebp subl $24, %esp movl $807671820, %eax movl $1616136252, %edx movl %eax, -8(%ebp) movl %edx, -4(%ebp) movl -8(%ebp), %eax movl -4(%ebp), %edx movl %eax, -24(%ebp) movl %edx, -20(%ebp) movq -24(%ebp), %mm1 leave movq %mm1, %mm0 ret
I think this is more related to PR 14552 which was shown by me that we regressed because we did not output emms at all before so not emmiting mmx instructions without use of the functions in mmintrin.h
Confirmed.
*** Bug 23660 has been marked as a duplicate of this bug. ***
In the discussion on the duplicate PR 23660, rth explained part of this here: http://gcc.gnu.org/ml/gcc/2005-08/msg00934.html W.
*** Bug 34256 has been marked as a duplicate of this bug. ***
Subject: Bug 22076 Author: uros Date: Sat Feb 23 15:24:02 2008 New Revision: 132572 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132572 Log: PR target/22076 PR target/34256 * config/i386/mmx.md (*mov<mode>_internal_rex64): Use "!y" to prevent reload from using MMX registers. (*mov<mode>_internal): Ditto. (*movv2sf_internal_rex64): Ditto. (*movv2sf_internal): Ditto. testsuite/ChangeLog: PR target/22076 PR target/34256 * gcc.target/i386/pr22076.c: New test. * gcc.target/i386/pr34256.c: New test. * gcc.target/i386/vecinit-5.c: New test. * gcc.target/i386/vecinit-6.c: New test. * gcc.target/i386/vecinit-[1-4].c: Check that no MMX register is used. * g++.dg/compat/struct-layout-1.h: Do not include <mmintrin.h> and <xmmintrin.h>, define __m64 and __m128 directly. * g++.dg/compat/struct-layout-1_generate.c: Add -mno-mmx for x86. Added: trunk/gcc/testsuite/gcc.target/i386/pr22076.c trunk/gcc/testsuite/gcc.target/i386/pr34256.c trunk/gcc/testsuite/gcc.target/i386/vecinit-5.c trunk/gcc/testsuite/gcc.target/i386/vecinit-6.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/mmx.md trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/g++.dg/compat/struct-layout-1.h trunk/gcc/testsuite/g++.dg/compat/struct-layout-1_generate.c trunk/gcc/testsuite/gcc.target/i386/vecinit-1.c trunk/gcc/testsuite/gcc.target/i386/vecinit-2.c trunk/gcc/testsuite/gcc.target/i386/vecinit-3.c trunk/gcc/testsuite/gcc.target/i386/vecinit-4.c
Fixed.
Fixed for real.
The gcc.target/i386/pr22076.c test case fails for i686-apple-darwin9 at -m64 as follows... Executing on host: /sw/src/fink.build/gcc44-4.3.999-20081115/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20081115/darwin_objdi r/gcc/ /sw/src/fink.build/gcc44-4.3.999-20081115/gcc-4.4-20081115/gcc/testsuite/gcc.target/i386/pr22076.c -O2 -flax-vector-conversions -mm mx -S -m64 -o pr22076.s (timeout = 300) PASS: gcc.target/i386/pr22076.c (test for excess errors) FAIL: gcc.target/i386/pr22076.c scan-assembler-times movq 3
Created attachment 16692 [details] assembly file generated for gcc.target/i386/pr22076.c at -m64 on i686-apple-darwin9