This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: Add SSE4.1 support


> > On Fri, Apr 20, 2007 at 01:18:09PM -0700, H. J. Lu wrote:
> > > But "mov xmm, gr" is always a win, 
> > 
> > Um, this assertion is FALSE for AMD.
> 
> Indeed.  AMD has different length of reg and XMM queues (xmm one is
> longer).  Instructions affecting both units needs to synchronize those
> two that is bit expensive.
> 
> It seems to me that for !INTER_UNIT_MOVES targets, the intrincisc
> representing xmm->gr or gr->xmm moves of some form should be
> automatically optimized into xmm->mem->gr or gr->mem->xmm form, so we
> won't run into this problem at all.

Just for a record, other sane behaviour I can think of (and what I think
H. J. is shooting for) is to make GCC closely follow what user wrote
expecting that user knows why he writes XMM->gr move (for example by
verifying that the code is not running on AMD chip or that particular
code path is cold) that would need some wrapping in unspecs to avoid
generic simplifiers optimizing the intrincisc.

It would make sense because XMM code is most likely CPU specific and the
intrincisc looks like C encoding of assembly language. Our
blended bodel defaults to !INTER_UNIT_MOVES so to effectively use
xmm->gr or gr->xmm builtins user would need to separate the code into
unit compiled with apropriate -march flag that is bit dificult.

However this would require quite large reorganization of SSE builtins
patterns and I think is better to give optimizer freedom to optimize
user's SSE code as we do now.

We probably should not stay somewhere in between those two cases.

Honza
> 
> Honza
> > 
> > 
> > r~


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]