[Bug tree-optimization/101668] BB vectorizer doesn't handle lowpart of existing vector

Thu Jul 29 06:55:56 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2021-07-29
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
                 CC|                            |rguenth at gcc dot gnu.org
            Summary|vectorizer doesn't          |BB vectorizer doesn't
                   |categorize vector construct |handle lowpart of existing
                   |cost right.                 |vector
             Blocks|                            |53947

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The basic-block vectorizer is currently limited as to what "existing" vectors
it recognizes.  In this testcase we're accessing only the lowpart of 'src',
something we cannot yet model in vectorizable_slp_permutation.  The specific
case isn't hard to fix, we'd get

  <bb 2> [local count: 1073741824]:
  _31 = VIEW_CONVERT_EXPR<vector(8) int>(src_18(D));
  vect__2.4_33 = [vec_unpack_lo_expr] _31;
  vect__2.4_34 = [vec_unpack_hi_expr] _31;
  MEM <vector(4) long long int> [(long long int *)&tem] = vect__2.4_33;
  MEM <vector(4) long long int> [(long long int *)&tem + 32B] = vect__2.4_34;
  _17 = MEM[(v8di *)&tem];
  *dst_28(D) = _17;
  tem ={v} {CLOBBER};
  return;

so we then fail to elide the temporary, producing

bar_s32_s64:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        vpmovsxdq       %xmm0, %ymm1
        vextracti128    $0x1, %ymm0, %xmm0
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        andq    $-64, %rsp
        subq    $8, %rsp
        vpmovsxdq       %xmm0, %ymm0
        vmovdqa %ymm1, -56(%rsp)
        vmovdqa %ymm0, -24(%rsp)
        vmovdqa64       -56(%rsp), %zmm2
        vmovdqa64       %zmm2, (%rdi)
        leave
        .cfi_def_cfa 7, 8
        ret

it looks like there's no V8SI->V8DI conversion optab or we choose V4DI
for some other reason as prefered vector mode.

Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations