[Bug tree-optimization/92819] [10 Regression] Worse code generated on avx2 due to simplify_vector_constructor
rguenth at gcc dot gnu.org
gcc-bugzilla@gcc.gnu.org
Thu Jan 30 13:04:00 GMT 2020
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92819
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
So when we hoist the narrowing across the permute to, for corge, instead of
_1 = __MEM <double> (p_4(D));
_5 = {_1, _1, _1, _1};
_6 = __VEC_PERM (x_2(D), _5, { 3ul, 5ul, 6ul, 7ul });
_7 = __BIT_FIELD_REF <v2df> (_6, 128u, 0u);
do
_1 = __BIT_FIELD_REF <v2df> (x_3(D), 128u, 128u);
_2 = __MEM <double> (p_5(D));
_7 = _Literal (v2df) {_2, _2};
_8 = __VEC_PERM (_1, _7, _Literal (v2di) { 1ul, 3ul });
then we get
vextractf128 $0x1, %ymm0, %xmm0
vmovddup (%rdi), %xmm1
vunpckhpd %xmm1, %xmm0, %xmm0
which would be OK, comparable to
vextractf128 $0x1, %ymm0, %xmm0
vunpckhpd %xmm0, %xmm0, %xmm0
vmovhpd (%rdi), %xmm0, %xmm0
doing the same for foo() gets us
vmovddup (%rdi), %xmm1
vunpckhpd %xmm1, %xmm0, %xmm0
which looks OK to me as well.
More information about the Gcc-bugs
mailing list