PATCH: Add XOP 128-bit and 256-bit support for upcoming AMD Orochi processor.

Tue Oct 20 21:34:00 GMT 2009

Honza,

> Well, if they are always shadowed by AVX or SSE4.1 equivalents then yes.
> Are there really n advantage n this mulv4si3 pattern over other two
> cases?
> >
Since XOP includes SSE4.1 and SSE4.1 equivalent of mulv4si3 is faster/shorter than xop based mulv4si3, I'll remove this from the patch.

> > > Hmm, there is no unspec or omething that would make it clear that we
> can
> > > not ever somehow simplify into this form with operand 2 being
> something
> > > different than parallel with const_ints.  I think this needs new
> > > predicate.
> >
> > I can define a new predicate for it in predicates.md, but I am not sure
> how exactly to represent the "parallel with const ints" part.
> 
> You can use simple C code there, just see how i.e.
> x86_64_immediate_operand is defined.
> 
The above comment applies here as well, since we have not yet done any perf analysis comparing pperm based pack/unpack vs sse4.1 based on Orochi, it is better to remove this from the patch for now.

Tested and bootstrapped on x86_64.

Is this OK?

Thanks,
Dwarak
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xop-gcc.patch
Type: application/octet-stream
Size: 158739 bytes
Desc: xop-gcc.patch
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20091020/b840bb7d/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xop-ChangeLog
Type: application/octet-stream
Size: 8271 bytes
Desc: xop-ChangeLog
URL: <http://gcc.gnu.org/pipermail/gcc-patches/attachments/20091020/b840bb7d/attachment-0001.obj>