[Bug target/19530] MMX load intrinsic produces SSE superfluous instructions (movlps)

Thu Jan 27 03:53:00 GMT 2005

------- Additional Comments From rth at gcc dot gnu dot org  2005-01-27 03:52 -------
> So, is there some sort of "pragma" that could be used to disable SSE
> registers(force -mmmx sort of) for only part of some code? 

No.

> __m64 should always be on mmx registers, and __m128 should always be on
> xmm registers.

Well, yes and no.  Given SSE2, one *can* implement *everything* in 
<mmintrin.h> with SSE registers.

> I can also prevent it from using an xmm register by [...]

... doing something complicated enough that, for the existing patterns
defined by the x86 backend, it very much more strongly prefers the mmx
registers.  Your problems with preferencing will come only when the
register in question is only used for data movement.

Which, as can be seen in your _mm_unpacklo_pi8 test case, can happpen
at surprising times.  There are *two* registers to be register allocated
there.  The one that does the actual unpack operation *is* forced to be
in an MMX register.  The other moves the result to the return register,
and that's the one that gets mis-allocated to the SSE register set.

> If one wants to move one 32 bit integer to a mmx register, that should be the
> job of a specialized intrinsics (_mm_cvtsi32_si64) which maps to a MOVD
> instruction.

With gcc, NONE of the intrinsics is strict 1-1 mapping to ANY instruction.

> Does it make sense? Is this what you mean by a complete rewrite or were you
> thinking of something else?

Gcc has some facilities for generic vector operations.  Ones that don't use
any of the foointrin.h header files.  When that happens, the compiler starts
trying to use the MMX registers.  But it doesn't know how to place the
necessary emms instruction, which is Bad.

At the moment, the only way to prevent this from happening is to strongly
discourage gcc from using the MMX registers to move data around.  This is
done in such a way that the only time it will use an MMX register is when
we have no other choice.  Which is why you see the compiler starting to 
use SSE registers when they are available.

You might think that we could easily use some pragma or something when
<mmintrin.h> is in use, since it's the user's responsibility to call 
_mm_empty when necessary.  Except that due to how gcc is structured
internally, you'd not be able to be 100% certain that all of the mmx
data movement remained where you expected.  Indeed, we have open PRs
where this kind of movement is in fact shown to happen.

Thus the ONLY solution that is sure to be correct is to teach the 
compiler what using MMX registers means, and where to place emms
instructions.  Which is the subject of the PR against which this PR
is marked as being blocked.

This cannot be addressed in any satisfactory way for 4.0.

Frankly, I wonder if it's worth addressing at all.  To my mind it's 
just as easy to write pure assembly for MMX.  And pretty soon the
vast majority of ia32 machines will have SSE2, and that is what the
autovectorizer will be targeting anyway.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19530