This is the mail archive of the
mailing list for the GCC project.
Re: x86 patch: SSE-based FP<=>int conversions, round 2
- From: Michael Matz <matz at suse dot de>
- To: Stuart Hastings <stuart at apple dot com>
- Cc: Jan Hubicka <hubicka at ucw dot cz>, gcc-patches at gcc dot gnu dot org
- Date: Wed, 13 Dec 2006 20:29:01 +0100 (CET)
- Subject: Re: x86 patch: SSE-based FP<=>int conversions, round 2
- References: <7EA9B724-78B4-4A1D-9AA7-622B2AE06A64@apple.com>
On Wed, 13 Dec 2006, Stuart Hastings wrote:
> > > +;; Move a DI from a 32-bit register pair (e.g. %edx:%eax) to an xmm.
> > > +;; We'd rather avoid this entirely; if the 32-bit reg pair was loaded
> > > +;; from memory, we'd prefer to load the memory directly into the %xmm
> > > +;; register. To facilitate this happy circumstance, this pattern won't
> > > +;; split until after register allocation. If the 64-bit value didn't
> > > +;; come from memory, this is the best we can do. This is much better
> > > +;; than storing %edx:%eax into a stack temporary and loading an %xmm
> > > +;; from there.
> > AMD chips probably would need extra care here, since it is preferrable
> > AFAIK there to offload the operand into memory.
> Wow; I am astonished. O.K., happy to do it; shall I make this
> contingent on Intel targets?
> (Are you sure? I'm very surprised that AMD prefers these values to go
> through memory.)
Yes, this really is so. For moving from GPRs to SSE (and MMX) register
you should go through memory (taking the latencies into account when
scheduling). Reason is that MOVD with GPR source and SSE dest has very
long latency. For the other direction one should use MOVD.
In general if one makes changes especially to the "strange looking"
aspects of x86-64 which implement things in seemingly suboptimal ways
(going over memory, or going different ways depending on data flow
direction, or doing stuff separately which can also be done in one insn)
one should be extremely cautious and measure all these changes on AMD64
carefully. Much of the funny stuff has a reason.
> > > +(define_insn_and_split "movdi_to_sse"
> > > + [(parallel
> > > + [(set (match_operand:V4SI 0 "register_operand" "=x")
> > > + (subreg:V4SI (match_operand:DI 1 "register_operand" "r") 0))
> > We don't want to use SUBREGs to access the scalars within vectors.
> > We need to use instead the vec_merge stuff. See how loadld is
> > implemented.
> > If your splitter trick is basically needed to deal with memory operand,
> > why you don't allow "m" and don't have the easy path splitter here?
> I'm sorry, I don't understand what you're suggesting. :-(
> This splitter trick is to deal with DImode pseudos, and avoid copying
> these values into the stack on their way to an %xmm register.
This for instance is something you don't want in general :) You want to
go over memory here (for AMD64 at least, haven't checked with Intel).