int __attribute__((__aligned__(8))) a;
int __attribute__((__aligned__(8))) b;
a = b + 1;
a = b + 2;
a = b + 3;
a = b + 4;
a = b * 3;
a = b * 4;
a = b * 5;
a = b * 7;
should be vectorized using V4SI vectors in two SLP groups so we can
vectorize not only the store but also the loads and the add. When
using -mavx2 we instead get only the store vectorized (even with
cost modeling enabled) because we try vectorizing that first.
It might be possible to guide SLP splitting during the SLP build
in a similar way how we try commutating operands. So when we figure
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c:12:10: note: Build SLP for _9 = _1 * 3;
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c:12:10: note: get vectype for scalar type (group size 8): int
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c:12:10: note: vectype: vector(8) int
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c:12:10: note: nunits = 8
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c:12:10: missed: Build SLP failed: different operation in stmt _9 = _1 * 3;
/home/rguenther/src/gcc3/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c:12:10: missed: original stmt _2 = _1 + 1;
and see the parent op (the store in this case) cannot be commutated we
can see whether matches divides the vector with some constraints
and whether the other lanes with matches == false form a valid SLP
operand (we know the == true ones likely would). The results would then
be concatenated via a permute node.
This should eventually also replace the splitting done in SLP instance
analysis (though splitting stores might still be necessary there).
The other option is to somehow tackle this with vector size iteration,
doing multiple analyses and comparing costs/benefit though it's hard
to not compare apples & oranges since the amount of code vectorized will
usually differ (as compared to loop vectorization)
The master branch has been updated by Richard Biener <email@example.com>:
Author: Richard Biener <firstname.lastname@example.org>
Date: Wed Nov 18 09:36:57 2020 +0100
Allow different vector types for stmt groups
This allows vectorization (in practice non-loop vectorization) to
have a stmt participate in different vector type vectorizations.
It allows us to remove vect_update_shared_vectype and replace it
by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
vect_analyze_stmt and vect_transform_stmt.
For data-ref the situation is a bit more complicated since we
analyze alignment info with a specific vector type in mind which
doesn't play well when that changes.
So the bulk of the change is passing down the actual vector type
used for a vectorized access to the various accessors of alignment
info, first and foremost dr_misalignment but also aligned_access_p,
known_alignment_for_access_p, vect_known_alignment_in_bytes and
vect_supportable_dr_alignment. I took the liberty to replace
ALL_CAPS macro accessors with the lower-case function invocations.
The actual changes to the behavior are in dr_misalignment which now
is the place factoring in the negative step adjustment as well as
handling alignment queries for a vector type with bigger alignment
requirements than what we can (or have) analyze(d).
vect_slp_analyze_node_alignment makes use of this and upon receiving
a vector type with a bigger alingment desire re-analyzes the DR
with respect to it but keeps an older more precise result if possible.
In this context it might be possible to do the analysis just once
but instead of analyzing with respect to a specific desired alignment
look for the biggest alignment we can compute a not unknown alignment.
The ChangeLog includes the functional changes but not the bulk due
to the alignment accessor API changes - I hope that's something good.
2021-09-17 Richard Biener <email@example.com>
* tree-vectorizer.h (dr_misalignment): Add vector type
(vect_known_alignment_in_bytes): Likewise. Refactor.
* tree-vect-data-refs.c (dr_misalignment): Refactor, handle
a vector type with larger alignment requirement and apply
the negative step adjustment here.
(vect_compute_data_ref_alignment): Get explicit vector type
argument, do not apply a negative step alignment adjustment
(vect_slp_analyze_node_alignment): Re-analyze alignment
when we re-visit the DR with a bigger desired alignment but
keep more precise results from smaller alignments.
* tree-vect-slp.c (vect_update_shared_vectype): Remove.
(vect_slp_analyze_node_operations_1): Do not update the
shared vector type on stmts.
* tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
vector type of an SLP node to the representative stmt-info.
* gcc.target/i386/vect-pr82426.c: New testcase.
* gcc.target/i386/vect-pr97352.c: Likewise.
Fixed for GCC 12.