[Bug target/87455] sse_packed_single_insn_optimal is suboptimal on Zen

Fri Sep 28 13:10:00 GMT 2018

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455

--- Comment #3 from Fanael <fanael4 at gmail dot com> ---
> May be we should remove xorps generation part.

If it were up to me, I'd keep to for BDVER[1234] only, because xorps is still
one byte shorted than either xorpd or pxor and is as fast there, and introduce
a separate tune option for untyped vector *moves* specifically, which would
apply to BD, but also Zen, Pentium M, Core, Skylake (but not anything in
between, i.e. Nehalem to Broadwell (though my data on Ivy Bridge, Haswell and
Broadwell is not conclusive)) and other µarches where register-to-register
vector moves are renamed (as in Zen), untyped (as in Skylake) or always of the
same type (as in Core).