[PATCH] Allow different vector types for stmt groups

Wed Oct 13 17:03:16 GMT 2021

Hi,

On Mon, Sep 27 2021, Richard Biener via Gcc-patches wrote:
>
[...]
>
> The following is what I have pushed after re-bootstrapping and testing
> on x86_64-unknown-linux-gnu.
>
> Richard.
>
> From fc335f9fde40d7a20a1a6e38fd6f842ed93a039e Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Wed, 18 Nov 2020 09:36:57 +0100
> Subject: [PATCH] Allow different vector types for stmt groups
> To: gcc-patches@gcc.gnu.org
>
> This allows vectorization (in practice non-loop vectorization) to
> have a stmt participate in different vector type vectorizations.
> It allows us to remove vect_update_shared_vectype and replace it
> by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> vect_analyze_stmt and vect_transform_stmt.
>
> For data-ref the situation is a bit more complicated since we
> analyze alignment info with a specific vector type in mind which
> doesn't play well when that changes.
>
> So the bulk of the change is passing down the actual vector type
> used for a vectorized access to the various accessors of alignment
> info, first and foremost dr_misalignment but also aligned_access_p,
> known_alignment_for_access_p, vect_known_alignment_in_bytes and
> vect_supportable_dr_alignment.  I took the liberty to replace
> ALL_CAPS macro accessors with the lower-case function invocations.
>
> The actual changes to the behavior are in dr_misalignment which now
> is the place factoring in the negative step adjustment as well as
> handling alignment queries for a vector type with bigger alignment
> requirements than what we can (or have) analyze(d).
>
> vect_slp_analyze_node_alignment makes use of this and upon receiving
> a vector type with a bigger alingment desire re-analyzes the DR
> with respect to it but keeps an older more precise result if possible.
> In this context it might be possible to do the analysis just once
> but instead of analyzing with respect to a specific desired alignment
> look for the biggest alignment we can compute a not unknown alignment.
>
> The ChangeLog includes the functional changes but not the bulk due
> to the alignment accessor API changes - I hope that's something good.
>
> 2021-09-17  Richard Biener  <rguenther@suse.de>
>
> 	PR tree-optimization/97351
> 	PR tree-optimization/97352
> 	PR tree-optimization/82426
> 	* tree-vectorizer.h (dr_misalignment): Add vector type
> 	argument.
> 	(aligned_access_p): Likewise.
> 	(known_alignment_for_access_p): Likewise.
> 	(vect_supportable_dr_alignment): Likewise.
> 	(vect_known_alignment_in_bytes): Likewise.  Refactor.
> 	(DR_MISALIGNMENT): Remove.
> 	(vect_update_shared_vectype): Likewise.
> 	* tree-vect-data-refs.c (dr_misalignment): Refactor, handle
> 	a vector type with larger alignment requirement and apply
> 	the negative step adjustment here.
> 	(vect_calculate_target_alignment): Remove.
> 	(vect_compute_data_ref_alignment): Get explicit vector type
> 	argument, do not apply a negative step alignment adjustment
> 	here.
> 	(vect_slp_analyze_node_alignment): Re-analyze alignment
> 	when we re-visit the DR with a bigger desired alignment but
> 	keep more precise results from smaller alignments.
> 	* tree-vect-slp.c (vect_update_shared_vectype): Remove.
> 	(vect_slp_analyze_node_operations_1): Do not update the
> 	shared vector type on stmts.
> 	* tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
> 	vector type of an SLP node to the representative stmt-info.
> 	(vect_transform_stmt): Likewise.

I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP
433.milc bechmark when compiled with -Ofast -march=native -flto to this
commit.  See also:

  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0&plot.1=289.70.0&

I am not sure if a bugzilla bug is in order because I cannot reproduce
the regression neither on an AMD zen3 machine nor on Intel CascadeLake,
because the history of the benchmark performance and because I know milc
can be sensitive to conditions outside our control.  And the list of
dependencies of PR 26163 is long enough as it is.  OTOH, the regression
reproduces reliably for me.

Some relevant perf data:

BEFORE:
# Samples: 585K of event 'cycles:u'
# Event count (approx.): 472738682838
#
# Overhead       Samples  Command          Shared Object           Symbol
# ........  ............  ...............  ......................  .........................................
# 
    24.59%        140397  milc_peak.mine-  milc_peak.mine-lto-nat  [.] u_shift_fermion
    18.47%        105497  milc_peak.mine-  milc_peak.mine-lto-nat  [.] add_force_to_mom
    15.97%         96343  milc_peak.mine-  milc_peak.mine-lto-nat  [.] mult_su3_na
    15.29%         90027  milc_peak.mine-  milc_peak.mine-lto-nat  [.] mult_su3_nn
     5.55%         35114  milc_peak.mine-  milc_peak.mine-lto-nat  [.] path_product
     4.75%         27693  milc_peak.mine-  milc_peak.mine-lto-nat  [.] compute_gen_staple
     2.76%         16109  milc_peak.mine-  milc_peak.mine-lto-nat  [.] mult_su3_an
     2.42%         14255  milc_peak.mine-  milc_peak.mine-lto-nat  [.] imp_gauge_force.constprop.0
     2.02%         11561  milc_peak.mine-  milc_peak.mine-lto-nat  [.] mult_adj_su3_mat_4vec

AFTER:
# Samples: 634K of event 'cycles:u'
# Event count (approx.): 513635733685
#
# Overhead       Samples  Command          Shared Object           Symbol                                   
# ........  ............  ...............  ......................  .........................................
#
    24.04%        149010  milc_peak.mine-  milc_peak.mine-lto-nat  [.] add_force_to_mom
    23.76%        147370  milc_peak.mine-  milc_peak.mine-lto-nat  [.] u_shift_fermion
    14.19%         90929  milc_peak.mine-  milc_peak.mine-lto-nat  [.] mult_su3_nn
    14.14%         92912  milc_peak.mine-  milc_peak.mine-lto-nat  [.] mult_su3_na
     4.90%         33846  milc_peak.mine-  milc_peak.mine-lto-nat  [.] path_product
     3.89%         24621  milc_peak.mine-  milc_peak.mine-lto-nat  [.] mult_su3_an
     3.62%         22831  milc_peak.mine-  milc_peak.mine-lto-nat  [.] compute_gen_staple
     2.05%         13215  milc_peak.mine-  milc_peak.mine-lto-nat  [.] imp_gauge_force.constprop.0

Martin