97352 – gcc.dg/vect/bb-slp-pr78205.c fails to vectorize all opportunities with AVX

Bug 97352 - gcc.dg/vect/bb-slp-pr78205.c fails to vectorize all opportunities with AVX

Summary: gcc.dg/vect/bb-slp-pr78205.c fails to vectorize all opportunities with AVX

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	11.0

Importance:	P3 normal
Target Milestone:	12.0
Assignee:	Richard Biener

URL:
Keywords:	missed-optimization

Depends on:
Blocks:	vectorizer
	Show dependency tree / graph

Reported:	2020-10-09 11:05 UTC by Richard Biener
Modified:	2021-09-27 08:25 UTC (History)
CC List:	0 users

See Also:	97351
Host:
Target:	x86_64-- i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2021-09-02 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard Biener 2020-10-09 11:05:34 UTC

When using AVX vectors

double x[2], a[4], b[4], c[5];

void foo ()
{
  a[0] = c[0];
  a[1] = c[1];
  a[2] = c[0];
  a[3] = c[1];
  b[0] = c[2];
  b[1] = c[3];
  b[2] = c[2];
  b[3] = c[3];
  x[0] = c[4];
  x[1] = c[4];
}

only vectorizes the x[] stores since the overall SLP analysis succeeds with
V4DFmode but only parts of the opportunities are finally vectorized and thus
SLP vectorization with V2DFmode isn't even tried.

This is the issue that vector mode iteration works on wrong granularity.

/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:17:8: note:   === vect_slp_analyze_instance_dependence ===
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:9:8: note:   === vect_slp_analyze_instance_alignment ===
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:9:8: note:  removing SLP instance operations starting from: a[0] = _1;
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:13:8: note:   === vect_slp_analyze_instance_alignment ===
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:13:8: note:  removing SLP instance operations starting from: b[0] = _3;
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-pr78205.c:13:8: note:   === vect_slp_analyze_operations ===

and the failure is because of the now late performed

/* Analyze alignment of DRs of stmts in NODE.  */

static bool
vect_slp_analyze_node_alignment (vec_info *vinfo, slp_tree node)
{
...
  /* We need to commit to a vector type for the group now.  */
  if (is_a <bb_vec_info> (vinfo)
      && !vect_update_shared_vectype (first_stmt_info, SLP_TREE_VECTYPE (node)))
    return false;

because the other SLP instance (with x[] stores) set the vector type to V2DF
while this one wants V4DF.

Comment 1 Richard Biener 2020-10-12 07:29:52 UTC

A similar case is gcc.dg/vect/bb-slp-pr65935.c where based on luck we vectorize
either a large leading AVX chain or a single SSE chain.

Comment 2 Richard Biener 2020-10-13 11:47:24 UTC

So a simpler testcase is the following (but hinting at the possibly not generic
enough solution to split the load group):

double a[6], b[6];
void foo()
{
  a[0] = b[0];
  a[1] = b[1];
  a[2] = b[2];
  a[3] = b[3];
  a[4] = b[4];
  a[5] = b[5];
}

produces with SSE:

        movapd  b(%rip), %xmm0
        movapd  b+16(%rip), %xmm1
        movapd  b+32(%rip), %xmm2
        movaps  %xmm0, a(%rip)
        movaps  %xmm1, a+16(%rip)
        movaps  %xmm2, a+32(%rip)

and with AVX:

        vmovsd  b+32(%rip), %xmm0
        vmovapd b(%rip), %ymm1
        vmovsd  %xmm0, a+32(%rip)
        vmovsd  b+40(%rip), %xmm0
        vmovapd %ymm1, a(%rip)
        vmovsd  %xmm0, a+40(%rip)

while we'd like to see sth like

        vmovapd b(%rip), %ymm1
        vmovapd %ymm1, a(%rip)
        movapd  b+32(%rip), %xmm2
        movaps  %xmm2, a+32(%rip)

Comment 3 Andrew Pinski 2021-09-02 10:18:10 UTC

Confirmed.

Comment 4 GCC Commits 2021-09-27 08:24:33 UTC

The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:6390c5047adb75960f86d56582e6322aaa4d9281

commit r12-3893-g6390c5047adb75960f86d56582e6322aaa4d9281
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Nov 18 09:36:57 2020 +0100

    Allow different vector types for stmt groups
    
    This allows vectorization (in practice non-loop vectorization) to
    have a stmt participate in different vector type vectorizations.
    It allows us to remove vect_update_shared_vectype and replace it
    by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
    vect_analyze_stmt and vect_transform_stmt.
    
    For data-ref the situation is a bit more complicated since we
    analyze alignment info with a specific vector type in mind which
    doesn't play well when that changes.
    
    So the bulk of the change is passing down the actual vector type
    used for a vectorized access to the various accessors of alignment
    info, first and foremost dr_misalignment but also aligned_access_p,
    known_alignment_for_access_p, vect_known_alignment_in_bytes and
    vect_supportable_dr_alignment.  I took the liberty to replace
    ALL_CAPS macro accessors with the lower-case function invocations.
    
    The actual changes to the behavior are in dr_misalignment which now
    is the place factoring in the negative step adjustment as well as
    handling alignment queries for a vector type with bigger alignment
    requirements than what we can (or have) analyze(d).
    
    vect_slp_analyze_node_alignment makes use of this and upon receiving
    a vector type with a bigger alingment desire re-analyzes the DR
    with respect to it but keeps an older more precise result if possible.
    In this context it might be possible to do the analysis just once
    but instead of analyzing with respect to a specific desired alignment
    look for the biggest alignment we can compute a not unknown alignment.
    
    The ChangeLog includes the functional changes but not the bulk due
    to the alignment accessor API changes - I hope that's something good.
    
    2021-09-17  Richard Biener  <rguenther@suse.de>
    
            PR tree-optimization/97351
            PR tree-optimization/97352
            PR tree-optimization/82426
            * tree-vectorizer.h (dr_misalignment): Add vector type
            argument.
            (aligned_access_p): Likewise.
            (known_alignment_for_access_p): Likewise.
            (vect_supportable_dr_alignment): Likewise.
            (vect_known_alignment_in_bytes): Likewise.  Refactor.
            (DR_MISALIGNMENT): Remove.
            (vect_update_shared_vectype): Likewise.
            * tree-vect-data-refs.c (dr_misalignment): Refactor, handle
            a vector type with larger alignment requirement and apply
            the negative step adjustment here.
            (vect_calculate_target_alignment): Remove.
            (vect_compute_data_ref_alignment): Get explicit vector type
            argument, do not apply a negative step alignment adjustment
            here.
            (vect_slp_analyze_node_alignment): Re-analyze alignment
            when we re-visit the DR with a bigger desired alignment but
            keep more precise results from smaller alignments.
            * tree-vect-slp.c (vect_update_shared_vectype): Remove.
            (vect_slp_analyze_node_operations_1): Do not update the
            shared vector type on stmts.
            * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
            vector type of an SLP node to the representative stmt-info.
            (vect_transform_stmt): Likewise.
    
            * gcc.target/i386/vect-pr82426.c: New testcase.
            * gcc.target/i386/vect-pr97352.c: Likewise.

Comment 5 Richard Biener 2021-09-27 08:25:35 UTC

Fixed for GCC 12.