Bug 97352 - gcc.dg/vect/bb-slp-pr78205.c fails to vectorize all opportunities with AVX
Summary: gcc.dg/vect/bb-slp-pr78205.c fails to vectorize all opportunities with AVX
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 11.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2020-10-09 11:05 UTC by Richard Biener
Modified: 2020-10-13 11:47 UTC (History)
0 users

See Also:
Host:
Target: x86_64-*-* i?86-*-*
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Biener 2020-10-09 11:05:34 UTC
When using AVX vectors

double x[2], a[4], b[4], c[5];

void foo ()
{
  a[0] = c[0];
  a[1] = c[1];
  a[2] = c[0];
  a[3] = c[1];
  b[0] = c[2];
  b[1] = c[3];
  b[2] = c[2];
  b[3] = c[3];
  x[0] = c[4];
  x[1] = c[4];
}

only vectorizes the x[] stores since the overall SLP analysis succeeds with
V4DFmode but only parts of the opportunities are finally vectorized and thus
SLP vectorization with V2DFmode isn't even tried.

This is the issue that vector mode iteration works on wrong granularity.

/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:17:8: note:   === vect_slp_analyze_instance_dependence ===
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:9:8: note:   === vect_slp_analyze_instance_alignment ===
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:9:8: note:  removing SLP instance operations starting from: a[0] = _1;
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:13:8: note:   === vect_slp_analyze_instance_alignment ===
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:13:8: note:  removing SLP instance operations starting from: b[0] = _3;
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:13:8: note:   === vect_slp_analyze_operations ===

and the failure is because of the now late performed

/* Analyze alignment of DRs of stmts in NODE.  */

static bool
vect_slp_analyze_node_alignment (vec_info *vinfo, slp_tree node)
{
...
  /* We need to commit to a vector type for the group now.  */
  if (is_a <bb_vec_info> (vinfo)
      && !vect_update_shared_vectype (first_stmt_info, SLP_TREE_VECTYPE (node)))
    return false;

because the other SLP instance (with x[] stores) set the vector type to V2DF
while this one wants V4DF.
Comment 1 Richard Biener 2020-10-12 07:29:52 UTC
A similar case is gcc.dg/vect/bb-slp-pr65935.c where based on luck we vectorize
either a large leading AVX chain or a single SSE chain.
Comment 2 Richard Biener 2020-10-13 11:47:24 UTC
So a simpler testcase is the following (but hinting at the possibly not generic
enough solution to split the load group):

double a[6], b[6];
void foo()
{
  a[0] = b[0];
  a[1] = b[1];
  a[2] = b[2];
  a[3] = b[3];
  a[4] = b[4];
  a[5] = b[5];
}

produces with SSE:

        movapd  b(%rip), %xmm0
        movapd  b+16(%rip), %xmm1
        movapd  b+32(%rip), %xmm2
        movaps  %xmm0, a(%rip)
        movaps  %xmm1, a+16(%rip)
        movaps  %xmm2, a+32(%rip)

and with AVX:

        vmovsd  b+32(%rip), %xmm0
        vmovapd b(%rip), %ymm1
        vmovsd  %xmm0, a+32(%rip)
        vmovsd  b+40(%rip), %xmm0
        vmovapd %ymm1, a(%rip)
        vmovsd  %xmm0, a+40(%rip)

while we'd like to see sth like

        vmovapd b(%rip), %ymm1
        vmovapd %ymm1, a(%rip)
        movapd  b+32(%rip), %xmm2
        movaps  %xmm2, a+32(%rip)