This is the mail archive of the
mailing list for the GCC project.
Re: x86 patch: SSE-based FP<=>int conversions, round 2
On Dec 13, 2006, at 11:29 AM, Michael Matz wrote:
On Wed, 13 Dec 2006, Stuart Hastings wrote:
+;; Move a DI from a 32-bit register pair (e.g. %edx:%eax) to an
+;; We'd rather avoid this entirely; if the 32-bit reg pair was
+;; from memory, we'd prefer to load the memory directly into the
+;; register. To facilitate this happy circumstance, this
+;; split until after register allocation. If the 64-bit value
+;; come from memory, this is the best we can do. This is much
+;; than storing %edx:%eax into a stack temporary and loading an
+;; from there.
AMD chips probably would need extra care here, since it is
AFAIK there to offload the operand into memory.
Wow; I am astonished. O.K., happy to do it; shall I make this
contingent on Intel targets?
(Are you sure? I'm very surprised that AMD prefers these values to
Yes, this really is so. For moving from GPRs to SSE (and MMX)
you should go through memory (taking the latencies into account when
scheduling). Reason is that MOVD with GPR source and SSE dest has
long latency. For the other direction one should use MOVD.
In general if one makes changes especially to the "strange looking"
aspects of x86-64 which implement things in seemingly suboptimal ways
(going over memory, or going different ways depending on data flow
direction, or doing stuff separately which can also be done in one
one should be extremely cautious and measure all these changes on
carefully. Much of the funny stuff has a reason.
O.K., if I add
to the body of this pattern, would that satisfy? This leaves the
delayed split intact, so that the combiner can still discard useless
loads of DImode values into GPRs. The pattern will expand normally
for Intel, and it won't for AMD.
I can target for AMD and test this, but all my hardware is PPC or
+ [(set (match_operand:V4SI 0 "register_operand" "=x")
+ (subreg:V4SI (match_operand:DI 1 "register_operand" "r") 0))
We don't want to use SUBREGs to access the scalars within vectors.
We need to use instead the vec_merge stuff. See how loadld is
If your splitter trick is basically needed to deal with memory
why you don't allow "m" and don't have the easy path splitter here?
I'm sorry, I don't understand what you're suggesting. :-(
This splitter trick is to deal with DImode pseudos, and avoid copying
these values into the stack on their way to an %xmm register.
This for instance is something you don't want in general :) You
go over memory here (for AMD64 at least, haven't checked with Intel).
I been told that Intel hardware dislikes differently-sized load/store
operations on the same location.