PING^1 [PATCH v2] rs6000: Add load density heuristic

Kewen.Lin linkw@linux.ibm.com
Wed Jun 9 02:26:06 GMT 2021


Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/571258.html

BR,
Kewen

on 2021/5/26 上午10:59, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> This is the updated version of patch to deal with the bwaves_r
> degradation due to vector construction fed by strided loads.
> 
> As Richi's comments [1], this follows the similar idea to over
> price the vector construction fed by VMAT_ELEMENTWISE or
> VMAT_STRIDED_SLP.  Instead of adding the extra cost on vector
> construction costing immediately, it firstly records how many
> loads and vectorized statements in the given loop, later in
> rs6000_density_test (called by finish_cost) it computes the
> load density ratio against all vectorized stmts, and check
> with the corresponding thresholds DENSITY_LOAD_NUM_THRESHOLD
> and DENSITY_LOAD_PCT_THRESHOLD, do the actual extra pricing
> if both thresholds are exceeded.
> 
> Note that this new load density heuristic check is based on
> some fields in target cost which are updated as needed when
> scanning each add_stmt_cost entry, it's independent of the
> current function rs6000_density_test which requires to scan
> non_vect stmts.  Since it's checking the load stmts count
> vs. all vectorized stmts, it's kind of density, so I put
> it in function rs6000_density_test.  With the same reason to
> keep it independent, I didn't put it as an else arm of the
> current existing density threshold check hunk or before this
> hunk.
> 
> In the investigation of -1.04% degradation from 526.blender_r
> on Power8, I noticed that the extra penalized cost 320 on one
> single vector construction with type V16QI is much exaggerated,
> which makes the final body cost unreliable, so this patch adds
> one maximum bound for the extra penalized cost for each vector
> construction statement.
> 
> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
> 
> Full SPEC2017 performance evaluation on Power8/Power9 with
> option combinations:
>   * -O2 -ftree-vectorize {,-fvect-cost-model=very-cheap} {,-ffast-math}
>   * {-O3, -Ofast} {,-funroll-loops}
> 
> bwaves_r degradations on P8/P9 have been fixed, nothing else
> remarkable was observed.
> 
> Is it ok for trunk?
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570076.html
> 
> BR,
> Kewen
> -----
> gcc/ChangeLog:
> 
> 	* config/rs6000/rs6000.c (struct rs6000_cost_data): New members
> 	nstmts, nloads and extra_ctor_cost.
> 	(rs6000_density_test): Add load density related heuristics and the
> 	checks, do extra costing on vector construction statements if need.
> 	(rs6000_init_cost): Init new members.
> 	(rs6000_update_target_cost_per_stmt): New function.
> 	(rs6000_add_stmt_cost): Factor vect_nonmem hunk out to function
> 	rs6000_update_target_cost_per_stmt and call it.
> 



More information about the Gcc-patches mailing list