Bug 34256 - mmx and movd/movq on x86_64
Summary: mmx and movd/movq on x86_64
Status: RESOLVED DUPLICATE of bug 22076
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.3.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2007-11-28 01:14 UTC by dean
Modified: 2008-12-13 19:29 UTC (History)
4 users (show)

See Also:
Host: x86_64-unknown-linux-gnu
Target: x86_64-unknown-linux-gnu
Build: x86_64-unknown-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2007-11-28 12:46:37


Attachments
Patch to adjust mmx move instructions (974 bytes, patch)
2007-11-28 12:45 UTC, Uroš Bizjak
Details | Diff
assembly file generated for gcc.target/i386/pr34256.c at -m64 on i686-apple-darwin9 (441 bytes, text/plain)
2008-11-16 00:09 UTC, Jack Howarth
Details
assembly file for gcc.target/i386/pr34256.c at -m64 on i686-apple-darwin9 with -fomit-frame-pointer (374 bytes, text/plain)
2008-11-17 00:37 UTC, Jack Howarth
Details

Note You need to log in before you can comment on or make changes to this bug.
Description dean 2007-11-28 01:14:57 UTC
gcc seems allergic to movq in the context of mmx:

% cat movq.c
#include <inttypes.h>
#include <mmintrin.h>

__m64 x;
__m64 y;

uint64_t foo(__m64 m) {
  return _mm_cvtm64_si64(_mm_add_pi32(x, y));
}
% gcc -g -O3 -Wall -std=gnu99   -c -o movq.o movq.c
% objdump -dr movq.o

movq.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
   0:   48 8b 05 00 00 00 00    mov    0(%rip),%rax        # 7 <foo+0x7>
                        3: R_X86_64_PC32        x+0xfffffffffffffffc
   7:   48 89 44 24 f8          mov    %rax,0xfffffffffffffff8(%rsp)
   c:   0f 6f 44 24 f8          movq   0xfffffffffffffff8(%rsp),%mm0
  11:   0f fe 05 00 00 00 00    paddd  0(%rip),%mm0        # 18 <foo+0x18>
                        14: R_X86_64_PC32       y+0xfffffffffffffffc
  18:   0f 7f 44 24 f8          movq   %mm0,0xfffffffffffffff8(%rsp)
  1d:   48 8b 44 24 f8          mov    0xfffffffffffffff8(%rsp),%rax
  22:   c3                      retq

the load of x should use "movq m64,mm".  this is true in i386 targets as well.

the transfer of %mm0 to %rax has the option of "movq %mm0,%rax" on x86_64, but should possibly be passed through memory depending on -mtune= settings:

for intel core2 always use movq directly between the registers, no matter which direction.

for AMD k8 family 15 always pass through mem

for AMD k8 family 16+, for gpr->xmm/mmx pass through memory and for xmm/mmx -> gpr always use movd/movq direct between the registers.

-dean

p.s. gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc/configure --prefix=/home/odo/gcc --disable-multilib --disable-biarch x86_64-unknown-linux-gnu --enable-languages=c
Thread model: posix
gcc version 4.3.0 20071128 (experimental) (GCC)
Comment 1 dean 2007-11-28 01:43:55 UTC
this appears to be a regression between gcc 4.1.x and 4.2.x.  i had to switch the intrinsic to _mm_cvtsi64_si64x but it otherwise generates the same code on 4.3.x...

ubuntu 4.1.2:

% objdump -dr movq.o

movq.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
   0:   0f 6f 05 00 00 00 00    movq   0(%rip),%mm0        # 7 <foo+0x7>
                        3: R_X86_64_PC32        x+0xfffffffffffffffc
   7:   0f fe 05 00 00 00 00    paddd  0(%rip),%mm0        # e <foo+0xe>
                        a: R_X86_64_PC32        y+0xfffffffffffffffc
   e:   48 0f 7e c0             movd   %mm0,%rax
  12:   c3                      retq

and 4.2.1:

movq.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
   0:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 7 <foo+0x7>
                        3: R_X86_64_PC32        x+0xfffffffffffffffc
   7:   48 89 44 24 f8          mov    %rax,-0x8(%rsp)
   c:   0f 6f 44 24 f8          movq   -0x8(%rsp),%mm0
  11:   0f fe 05 00 00 00 00    paddd  0x0(%rip),%mm0        # 18 <foo+0x18>
                        14: R_X86_64_PC32       y+0xfffffffffffffffc
  18:   0f 7f 44 24 f8          movq   %mm0,-0x8(%rsp)
  1d:   48 8b 44 24 f8          mov    -0x8(%rsp),%rax
  22:   c3                      retq


Comment 2 Uroš Bizjak 2007-11-28 12:45:34 UTC
Created attachment 14653 [details]
Patch to adjust mmx move instructions

It looks that mmx move instructions need some tuning.  Attached patch fixes your problems and generates (-march=core2):

foo:
.LFB4:
        movq    x(%rip), %mm0
        paddd   y(%rip), %mm0
        movd    %mm0, %rax
        ret

Since these RA adjustments are very fragile, this patch is not appropriate for stage3.
Comment 3 Uroš Bizjak 2007-11-28 12:46:37 UTC
Confirmed.
Comment 4 Uroš Bizjak 2007-12-10 08:40:41 UTC

*** This bug has been marked as a duplicate of 22076 ***
Comment 5 uros 2008-02-23 15:24:41 UTC
Subject: Bug 34256

Author: uros
Date: Sat Feb 23 15:24:02 2008
New Revision: 132572

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132572
Log:
        PR target/22076
        PR target/34256

        * config/i386/mmx.md (*mov<mode>_internal_rex64): Use "!y" to
        prevent reload from using MMX registers.
        (*mov<mode>_internal): Ditto.
        (*movv2sf_internal_rex64): Ditto.
        (*movv2sf_internal): Ditto.

testsuite/ChangeLog:

        PR target/22076
        PR target/34256
        * gcc.target/i386/pr22076.c: New test.
        * gcc.target/i386/pr34256.c: New test.
        * gcc.target/i386/vecinit-5.c: New test.
        * gcc.target/i386/vecinit-6.c: New test.
        * gcc.target/i386/vecinit-[1-4].c: Check that no MMX register is used.

        * g++.dg/compat/struct-layout-1.h: Do not include <mmintrin.h> and
        <xmmintrin.h>, define __m64 and __m128 directly.
        * g++.dg/compat/struct-layout-1_generate.c: Add -mno-mmx for x86.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr22076.c
    trunk/gcc/testsuite/gcc.target/i386/pr34256.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-5.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-6.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/mmx.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/g++.dg/compat/struct-layout-1.h
    trunk/gcc/testsuite/g++.dg/compat/struct-layout-1_generate.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-1.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-2.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-3.c
    trunk/gcc/testsuite/gcc.target/i386/vecinit-4.c

Comment 6 Jack Howarth 2008-11-16 00:07:53 UTC
The gcc.target/i386/pr34256.c test case fails on i686-apple-darwin9 at -m64 as follows...

Executing on host: /sw/src/fink.build/gcc44-4.3.999-20081115/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20081115/darwin_objdi
r/gcc/ /sw/src/fink.build/gcc44-4.3.999-20081115/gcc-4.4-20081115/gcc/testsuite/gcc.target/i386/pr34256.c   -O2 -march=core2 -S  -m64 -o pr3
4256.s    (timeout = 300)
PASS: gcc.target/i386/pr34256.c (test for excess errors)
FAIL: gcc.target/i386/pr34256.c scan-assembler-times mov 4

Comment 7 Jack Howarth 2008-11-16 00:09:34 UTC
Created attachment 16691 [details]
assembly file generated for gcc.target/i386/pr34256.c at -m64 on i686-apple-darwin9
Comment 8 Jack Howarth 2008-11-17 00:34:53 UTC
The gcc.target/i386/pr34256.c test case is still failing as...

Executing on host: /sw/src/fink.build/gcc44-4.3.999-20081116/darwin_objdir/gcc/xgcc -B/sw/src/fink.build/gcc44-4.3.999-20081116/darwin_objdir/gcc/ /sw/src/fink.build/gcc44-4.3.999-20081116/gcc-4.4-20081116/gcc/testsuite/gcc.target/i386/pr34256.c   -O2 -fomit-frame-pointer -march=core2 -S  -m64 -o pr34256.s    (timeout = 300)
PASS: gcc.target/i386/pr34256.c (test for excess errors)
FAIL: gcc.target/i386/pr34256.c scan-assembler-times mov 4
UNSUPPORTED: gcc.target/i386/pr34312.c
UNSUPPORTED: gcc.target/i386/pr34522.c
UNSUPPORTED: gcc.target/i386/pr35083.c
Comment 9 Jack Howarth 2008-11-17 00:37:48 UTC
Created attachment 16704 [details]
assembly file for gcc.target/i386/pr34256.c at -m64 on i686-apple-darwin9 with -fomit-frame-pointer
Comment 10 Jack Howarth 2008-12-13 18:38:10 UTC
On i686-apple-darwin9, I have been using...

Using built-in specs.
Target: i686-apple-darwin9
Configured with: ../gcc-4.4-20081213/configure --prefix=/sw --prefix=/sw/lib/gcc4.4 --mandir=/sw/share/man --infodir=/sw/share/info --enable-languages=c,c++,fortran,objc,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-ppl=/sw --with-cloog=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --with-arch=nocona --with-tune=generic --build=i686-apple-darwin9 --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.4.0 20081213 (experimental) (GCC) 

when the testsuite produces the failure...

FAIL: gcc.target/i386/pr34256.c scan-assembler-times mov 4


Comment 11 Uroš Bizjak 2008-12-13 19:29:31 UTC
(In reply to comment #10)

> FAIL: gcc.target/i386/pr34256.c scan-assembler-times mov 4

PR 37364