This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: x86 patch: SSE-based FP<=>int conversions, round 2

From: Michael Matz <matz at suse dot de>
To: Stuart Hastings <stuart at apple dot com>
Cc: Jan Hubicka <hubicka at ucw dot cz>, gcc-patches at gcc dot gnu dot org
Date: Wed, 13 Dec 2006 20:29:01 +0100 (CET)
Subject: Re: x86 patch: SSE-based FP<=>int conversions, round 2
References: <7EA9B724-78B4-4A1D-9AA7-622B2AE06A64@apple.com>

Hi,

On Wed, 13 Dec 2006, Stuart Hastings wrote:

> > > +;; Move a DI from a 32-bit register pair (e.g. %edx:%eax) to an xmm.
> > > +;; We'd rather avoid this entirely; if the 32-bit reg pair was loaded
> > > +;; from memory, we'd prefer to load the memory directly into the %xmm
> > > +;; register.  To facilitate this happy circumstance, this pattern won't
> > > +;; split until after register allocation.  If the 64-bit value didn't
> > > +;; come from memory, this is the best we can do.  This is much better
> > > +;; than storing %edx:%eax into a stack temporary and loading an %xmm
> > > +;; from there.
> >
> > AMD chips probably would need extra care here, since it is preferrable
> > AFAIK there to offload the operand into memory.
> 
> Wow; I am astonished.  O.K., happy to do it; shall I make this
> contingent on Intel targets?
> 
> (Are you sure?  I'm very surprised that AMD prefers these values to go
> through memory.)

Yes, this really is so.  For moving from GPRs to SSE (and MMX) register 
you should go through memory (taking the latencies into account when 
scheduling).  Reason is that MOVD with GPR source and SSE dest has very 
long latency.  For the other direction one should use MOVD.

In general if one makes changes especially to the "strange looking" 
aspects of x86-64 which implement things in seemingly suboptimal ways 
(going over memory, or going different ways depending on data flow 
direction, or doing stuff separately which can also be done in one insn) 
one should be extremely cautious and measure all these changes on AMD64 
carefully.  Much of the funny stuff has a reason.

> > > +(define_insn_and_split "movdi_to_sse"
> > > +  [(parallel
> > > +    [(set (match_operand:V4SI 0 "register_operand" "=x")
> > > +	  (subreg:V4SI (match_operand:DI 1 "register_operand"  "r") 0))
> >
> > We don't want to use SUBREGs to access the scalars within vectors.
> > We need to use instead the vec_merge stuff.  See how loadld is
> > implemented.
> >
> > If your splitter trick is basically needed to deal with memory operand,
> > why you don't allow "m" and don't have the easy path splitter here?
> 
> I'm sorry, I don't understand what you're suggesting.  :-(
> 
> This splitter trick is to deal with DImode pseudos, and avoid copying
> these values into the stack on their way to an %xmm register.

This for instance is something you don't want in general :)  You want to 
go over memory here (for AMD64 at least, haven't checked with Intel).

Ciao,
Michael.

Follow-Ups:
- Re: x86 patch: SSE-based FP<=>int conversions, round 2
  - From: Stuart Hastings

References:
- Re: x86 patch: SSE-based FP<=>int conversions, round 2
  - From: Stuart Hastings

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]