This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

From: "guardia at sympatico dot ca" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 29 Jan 2005 04:47:24 -0000
Subject: [Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)
References: <20050119145614.19530.guardia@sympatico.ca>
Reply-to: gcc-bugzilla at gcc dot gnu dot org

------- Additional Comments From guardia at sympatico dot ca  2005-01-29 04:47 -------
Hum, there apparently seems to be a problem with the optimization stages.. I
cooked up another snippet :

void moo(__m64 i, unsigned int *r)
{
   unsigned int tmp = __builtin_ia32_vec_ext_v2si (i, 0);
   *r = tmp;
}

With -O0 -mmmx we get:
        movd    %mm0, -4(%ebp)
        movl    8(%ebp), %edx
        movl    -4(%ebp), %eax
        movl    %eax, (%edx)
Which with -O3 gets reduced to:
        movl    8(%ebp), %eax
        movd    %mm0, (%eax)

Now, clearly it understands that "movd" is the same as "movl", except they work
on different registers on an MMX only machine. With "movlps" and "movq" it
should do the same I think? If the optimization stages can work this out, maybe
we wouldn't need to rewrite the MMX/SSE1 support...

(BTW, correction, when I said 200+ instructions to schedule, I meant per
function. I have a dozen such functions with 200+ instructions, and it ain't
going to get any smaller)

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530

References:
- [Bug regression/19530] New: MMX load intrinsic produces SSE superflus instructions (movlps)
  - From: guardia at sympatico dot ca

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]