This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: x86 patch: SSE-based FP<=>int conversions, round 2

On Dec 13, 2006, at 11:29 AM, Michael Matz wrote:


On Wed, 13 Dec 2006, Stuart Hastings wrote:

+;; Move a DI from a 32-bit register pair (e.g. %edx:%eax) to an xmm.
+;; We'd rather avoid this entirely; if the 32-bit reg pair was loaded
+;; from memory, we'd prefer to load the memory directly into the %xmm
+;; register. To facilitate this happy circumstance, this pattern won't
+;; split until after register allocation. If the 64-bit value didn't
+;; come from memory, this is the best we can do. This is much better
+;; than storing %edx:%eax into a stack temporary and loading an %xmm
+;; from there.

AMD chips probably would need extra care here, since it is preferrable
AFAIK there to offload the operand into memory.

Wow; I am astonished. O.K., happy to do it; shall I make this contingent on Intel targets?

(Are you sure? I'm very surprised that AMD prefers these values to go
through memory.)

Yes, this really is so. For moving from GPRs to SSE (and MMX) register
you should go through memory (taking the latencies into account when
scheduling). Reason is that MOVD with GPR source and SSE dest has very
long latency. For the other direction one should use MOVD.

In general if one makes changes especially to the "strange looking"
aspects of x86-64 which implement things in seemingly suboptimal ways
(going over memory, or going different ways depending on data flow
direction, or doing stuff separately which can also be done in one insn)
one should be extremely cautious and measure all these changes on AMD64
carefully. Much of the funny stuff has a reason.

O.K., if I add


to the body of this pattern, would that satisfy? This leaves the delayed split intact, so that the combiner can still discard useless loads of DImode values into GPRs. The pattern will expand normally for Intel, and it won't for AMD.

I can target for AMD and test this, but all my hardware is PPC or Intel-based;

+(define_insn_and_split "movdi_to_sse"
+  [(parallel
+    [(set (match_operand:V4SI 0 "register_operand" "=x")
+	  (subreg:V4SI (match_operand:DI 1 "register_operand"  "r") 0))

We don't want to use SUBREGs to access the scalars within vectors. We need to use instead the vec_merge stuff. See how loadld is implemented.

If your splitter trick is basically needed to deal with memory operand,
why you don't allow "m" and don't have the easy path splitter here?

I'm sorry, I don't understand what you're suggesting. :-(

This splitter trick is to deal with DImode pseudos, and avoid copying
these values into the stack on their way to an %xmm register.

This for instance is something you don't want in general :) You want to
go over memory here (for AMD64 at least, haven't checked with Intel).

I been told that Intel hardware dislikes differently-sized load/store operations on the same location.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]