[Bug tree-optimization/92819] [10 Regression] Worse code generated on avx2 due to simplify_vector_constructor

rguenth at gcc dot gnu.org gcc-bugzilla@gcc.gnu.org
Thu Jan 30 13:04:00 GMT 2020


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92819

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
So when we hoist the narrowing across the permute to, for corge, instead of

  _1 = __MEM <double> (p_4(D));
  _5 = {_1, _1, _1, _1};
  _6 = __VEC_PERM (x_2(D), _5, { 3ul, 5ul, 6ul, 7ul });
  _7 = __BIT_FIELD_REF <v2df> (_6, 128u, 0u);

do

  _1 = __BIT_FIELD_REF <v2df> (x_3(D), 128u, 128u);
  _2 = __MEM <double> (p_5(D));
  _7 = _Literal (v2df) {_2, _2};
  _8 = __VEC_PERM (_1, _7, _Literal (v2di) { 1ul, 3ul });

then we get

        vextractf128    $0x1, %ymm0, %xmm0
        vmovddup        (%rdi), %xmm1
        vunpckhpd       %xmm1, %xmm0, %xmm0

which would be OK, comparable to

        vextractf128    $0x1, %ymm0, %xmm0
        vunpckhpd       %xmm0, %xmm0, %xmm0
        vmovhpd (%rdi), %xmm0, %xmm0

doing the same for foo() gets us

        vmovddup        (%rdi), %xmm1
        vunpckhpd       %xmm1, %xmm0, %xmm0

which looks OK to me as well.


More information about the Gcc-bugs mailing list